> I've worked a lot recently on the Hibernate Lucene integration. Here 
> are the concepts, the new features and the todo list.
> Please comment and give feedbacks.
>
> My work is commited in branches/Lucene_Integration because we'll 
> probably need to be based on Hibernate 3.3
>
> *Concepts*
> Each time you change an object state, the lucene index is updated and 
> kept in sync. This is done through the Hibernate event system.
> Whether an entity is indexed or not and whether a property is indexed 
> or not is defined through annotations.
> You can also search through your domain model using Lucene and 
> retrieve managed objects. The whole idea here is to do a nice 
> integration between the search engine and the ORM without loosing the 
> search engine power, hence most of the API remains. To sum up, query 
> Lucene, get managed object back.
>
> *Mapping*
> A given entity is mapped to an index. A lucene index is stored in a 
> Directory, a Directory is a Lucnee abstract concept for index storage 
> system. It can be a memory directory (RAMDirectory), a file system 
> directory (FSDirectory) or any other kind of backend. Hibernate Lucene 
> introduce the notion of DirectoryProvider that you can configure and 
> define on a per entity basis (and wich is defaulted defaulted). The 
> concept is very similar to ConnectionProvider.
>
> Lucene only works with Strings, so you can define a @FieldBridge which 
> transform a java property into a Lucene Field (and potentially 
> vice-versa). A more simple (useful?) version handle the transformation 
> of a java property into a String.
> Some built-in FieldBrigde exists. @FieldBridge is very much like an 
> Hibernate Type. Esp I introduced the notion of precision in dates 
> (year, month, .. second, millisecond). This FieldBridge and 
> StringBridge gives a lot of flexibility in the way to design the 
> property indexing.
>
>
> *Querying*
> I've introduced the notion of LuceneSession which implements Session 
> and actually delegates to a regular Hibernate Session. This lucene 
> session has a /createLuceneQuery()/ method and a /index()/ method.
>
> /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene 
> query as a parameter and the list of targeted entities. Using a Lucene 
> query as a parameter gives the full Lucene flexibility (no abstraction 
> on top of it). An /org.hibernate.Query/ object is returned.
> You can (must) use pagination. A Lucene query also return the number 
> of matching results (regardless of the pagination): query.resultSize() 
> sort of count(*).
>    /list()/ returns the list of matching objects. It heavily depends 
> on batch-size to be efficient (ie the proxy are created for all the 
> results and then we initialize them.
> There might be alternative strategies here (select ... where id in ( , 
> , , ) ), but the real benefit would come if combined with the dynamic 
> fetching profile we talked about a while ago.
>    /iterate()/ has the same semantic as the regular method in 
> hibernate, meaning initialize the objects one by one.
>    /scroll()/ allows an efficient navigation into the resultset, 
> (objects are loaded one by one though).
> Having the dynamic fetch profile would definitely be a killer pair 
> (searching the lucene index, and fetching the appropriate object graph)
>
> /session.//index(Object)/ is currently not implemented it requires 
> some modifications of SessionImpl or of LuceneSession. This feature is 
> useful to initialize / refresh the index in a batch way (ie loading 
> the data and applying the indexing process on this set of data).
> Basically the object is added to the index queue. At flush() time, the 
> queue is processed.
>
> design considerations:
> The delegation vs subclassing strategy for LuceneSession (ie 
> LuceneSession delegating to a regular Session allowing simple wrapping 
> or the LuceneSessionImpl being a subclass of SessionImpl is an ongoing 
> discussion.
> Using a subclassing model would allow the LuceneSession to keep 
> operation queues (for batch indexing either through object changes or 
> through session.index() ), but it does not allow a potential Hibernate 
> - XXX integration on the same subclassing model. Batching is essential 
> in Lucene for performance reasons.
> Using the delegation model requires some SessionImpl modifications to 
> be able to keep track of a generic context. This context will keep the 
> operation queues.
>
>
> *ToDo*
> Argue on the LuceneSession design are pick up one (Steve/Emmanuel/Feel 
> free to join the danse)
>
> Find a way to keep the DocumentBuilder (sort of EntityPersister) at 
> the SessionFactory level rather than the EventListener level 
> (Steve/Emmanuel)
>
> Implement the use of FieldBridge for all properties. It is currently 
> used for the id property only (trivial).
>
> Batch changes: to do that I need to be able to keep a session related 
> queue of all insert/update changes. I can't in the current design 
> because SessionImpl does not have such concept and because the 
> LuceneSession is build on the delegation model. We need to discuss the 
> strategy here (delegation vs subclassing)
>
> Massive batch changes: in some system, we don't really bother with 
> "real time" index synchronization, for those a common batch changes 
> queue (ie across several sessions) would make sense with a queue 
> flushing happening every hour  for example.
>
> Clustered Directory: think about that. A JDBC Directory might not be 
> the perfect solution actually.
>
> fetch profile
>
> Align the field indexing annotations to Lucene 2.0
>
> Think aboud Analyser to give the same flexibility @Boost provide
>
> Make Lucene query parameterizable query.setParameter();
>
> implements additional strategies to load object on query.list()
>



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
hibernate-devel mailing list
hibernate-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hibernate-devel

Reply via email to