[Resent with a better formatting] I've worked a lot recently on the Hibernate Lucene integration. Here are the concepts, the new features and the todo list. Please comment and give feedbacks.
My work is commited in branches/Lucene_Integration because we'll probably need to be based on Hibernate 3.3 *Concepts* Each time you change an object state, the lucene index is updated and kept in sync. This is done through the Hibernate event system. Whether an entity is indexed or not and whether a property is indexed or not is defined through annotations. You can also search through your domain model using Lucene and retrieve managed objects. The whole idea here is to do a nice integration between the search engine and the ORM without loosing the search engine power, hence most of the API remains. To sum up, query Lucene, get managed object back. *Mapping* A given entity is mapped to an index. A lucene index is stored in a Directory, a Directory is a Lucnee abstract concept for index storage system. It can be a memory directory (RAMDirectory), a file system directory (FSDirectory) or any other kind of backend. Hibernate Lucene introduce the notion of DirectoryProvider that you can configure and define on a per entity basis (and wich is defaulted defaulted). The concept is very similar to ConnectionProvider. Lucene only works with Strings, so you can define a @FieldBridge which transform a java property into a Lucene Field (and potentially vice-versa). A more simple (useful?) version handle the transformation of a java property into a String. Some built-in FieldBrigde exists. @FieldBridge is very much like an Hibernate Type. Esp I introduced the notion of precision in dates (year, month, .. second, millisecond). This FieldBridge and StringBridge gives a lot of flexibility in the way to design the property indexing. *Querying* I've introduced the notion of LuceneSession which implements Session and actually delegates to a regular Hibernate Session. This lucene session has a /createLuceneQuery()/ method and a /index()/ method. /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene query as a parameter and the list of targeted entities. Using a Lucene query as a parameter gives the full Lucene flexibility (no abstraction on top of it). An /org.hibernate.Query/ object is returned. You can (must) use pagination. A Lucene query also return the number of matching results (regardless of the pagination): query.resultSize() sort of count(*). /list()/ returns the list of matching objects. It heavily depends on batch-size to be efficient (ie the proxy are created for all the results and then we initialize them. There might be alternative strategies here (select ... where id in ( , , , ) ), but the real benefit would come if combined with the dynamic fetching profile we talked about a while ago. /iterate()/ has the same semantic as the regular method in hibernate, meaning initialize the objects one by one. /scroll()/ allows an efficient navigation into the resultset, (objects are loaded one by one though). Having the dynamic fetch profile would definitely be a killer pair (searching the lucene index, and fetching the appropriate object graph) /session.//index(Object)/ is currently not implemented it requires some modifications of SessionImpl or of LuceneSession. This feature is useful to initialize / refresh the index in a batch way (ie loading the data and applying the indexing process on this set of data). Basically the object is added to the index queue. At flush() time, the queue is processed. design considerations: The delegation vs subclassing strategy for LuceneSession (ie LuceneSession delegating to a regular Session allowing simple wrapping or the LuceneSessionImpl being a subclass of SessionImpl is an ongoing discussion. Using a subclassing model would allow the LuceneSession to keep operation queues (for batch indexing either through object changes or through session.index() ), but it does not allow a potential Hibernate - XXX integration on the same subclassing model. Batching is essential in Lucene for performance reasons. Using the delegation model requires some SessionImpl modifications to be able to keep track of a generic context. This context will keep the operation queues. *ToDo* Argue on the LuceneSession design are pick up one (Steve/Emmanuel/Feel free to join the danse) Find a way to keep the DocumentBuilder (sort of EntityPersister) at the SessionFactory level rather than the EventListener level (Steve/Emmanuel) Implement the use of FieldBridge for all properties. It is currently used for the id property only (trivial). Batch changes: to do that I need to be able to keep a session related queue of all insert/update changes. I can't in the current design because SessionImpl does not have such concept and because the LuceneSession is build on the delegation model. We need to discuss the strategy here (delegation vs subclassing) Massive batch changes: in some system, we don't really bother with "real time" index synchronization, for those a common batch changes queue (ie across several sessions) would make sense with a queue flushing happening every hour for example. Clustered Directory: think about that. A JDBC Directory might not be the perfect solution actually. fetch profile Align the field indexing annotations to Lucene 2.0 Think aboud Analyser to give the same flexibility @Boost provide Make Lucene query parameterizable query.setParameter(); implements additional strategies to load object on query.list() ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ hibernate-devel mailing list hibernate-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/hibernate-devel