> I've worked a lot recently on the Hibernate Lucene integration. Here > are the concepts, the new features and the todo list. > Please comment and give feedbacks. > > My work is commited in branches/Lucene_Integration because we'll > probably need to be based on Hibernate 3.3 > > *Concepts* > Each time you change an object state, the lucene index is updated and > kept in sync. This is done through the Hibernate event system. > Whether an entity is indexed or not and whether a property is indexed > or not is defined through annotations. > You can also search through your domain model using Lucene and > retrieve managed objects. The whole idea here is to do a nice > integration between the search engine and the ORM without loosing the > search engine power, hence most of the API remains. To sum up, query > Lucene, get managed object back. > > *Mapping* > A given entity is mapped to an index. A lucene index is stored in a > Directory, a Directory is a Lucnee abstract concept for index storage > system. It can be a memory directory (RAMDirectory), a file system > directory (FSDirectory) or any other kind of backend. Hibernate Lucene > introduce the notion of DirectoryProvider that you can configure and > define on a per entity basis (and wich is defaulted defaulted). The > concept is very similar to ConnectionProvider. > > Lucene only works with Strings, so you can define a @FieldBridge which > transform a java property into a Lucene Field (and potentially > vice-versa). A more simple (useful?) version handle the transformation > of a java property into a String. > Some built-in FieldBrigde exists. @FieldBridge is very much like an > Hibernate Type. Esp I introduced the notion of precision in dates > (year, month, .. second, millisecond). This FieldBridge and > StringBridge gives a lot of flexibility in the way to design the > property indexing. > > > *Querying* > I've introduced the notion of LuceneSession which implements Session > and actually delegates to a regular Hibernate Session. This lucene > session has a /createLuceneQuery()/ method and a /index()/ method. > > /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene > query as a parameter and the list of targeted entities. Using a Lucene > query as a parameter gives the full Lucene flexibility (no abstraction > on top of it). An /org.hibernate.Query/ object is returned. > You can (must) use pagination. A Lucene query also return the number > of matching results (regardless of the pagination): query.resultSize() > sort of count(*). > /list()/ returns the list of matching objects. It heavily depends > on batch-size to be efficient (ie the proxy are created for all the > results and then we initialize them. > There might be alternative strategies here (select ... where id in ( , > , , ) ), but the real benefit would come if combined with the dynamic > fetching profile we talked about a while ago. > /iterate()/ has the same semantic as the regular method in > hibernate, meaning initialize the objects one by one. > /scroll()/ allows an efficient navigation into the resultset, > (objects are loaded one by one though). > Having the dynamic fetch profile would definitely be a killer pair > (searching the lucene index, and fetching the appropriate object graph) > > /session.//index(Object)/ is currently not implemented it requires > some modifications of SessionImpl or of LuceneSession. This feature is > useful to initialize / refresh the index in a batch way (ie loading > the data and applying the indexing process on this set of data). > Basically the object is added to the index queue. At flush() time, the > queue is processed. > > design considerations: > The delegation vs subclassing strategy for LuceneSession (ie > LuceneSession delegating to a regular Session allowing simple wrapping > or the LuceneSessionImpl being a subclass of SessionImpl is an ongoing > discussion. > Using a subclassing model would allow the LuceneSession to keep > operation queues (for batch indexing either through object changes or > through session.index() ), but it does not allow a potential Hibernate > - XXX integration on the same subclassing model. Batching is essential > in Lucene for performance reasons. > Using the delegation model requires some SessionImpl modifications to > be able to keep track of a generic context. This context will keep the > operation queues. > > > *ToDo* > Argue on the LuceneSession design are pick up one (Steve/Emmanuel/Feel > free to join the danse) > > Find a way to keep the DocumentBuilder (sort of EntityPersister) at > the SessionFactory level rather than the EventListener level > (Steve/Emmanuel) > > Implement the use of FieldBridge for all properties. It is currently > used for the id property only (trivial). > > Batch changes: to do that I need to be able to keep a session related > queue of all insert/update changes. I can't in the current design > because SessionImpl does not have such concept and because the > LuceneSession is build on the delegation model. We need to discuss the > strategy here (delegation vs subclassing) > > Massive batch changes: in some system, we don't really bother with > "real time" index synchronization, for those a common batch changes > queue (ie across several sessions) would make sense with a queue > flushing happening every hour for example. > > Clustered Directory: think about that. A JDBC Directory might not be > the perfect solution actually. > > fetch profile > > Align the field indexing annotations to Lucene 2.0 > > Think aboud Analyser to give the same flexibility @Boost provide > > Make Lucene query parameterizable query.setParameter(); > > implements additional strategies to load object on query.list() >
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ hibernate-devel mailing list hibernate-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/hibernate-devel