Re: [Hibernate] Hibernate Lucene integration

Max Rydahl Andersen Fri, 14 Jul 2006 02:09:14 -0700

Hi Emmanuel,

Here are my comments (sorry if something is obvious from looking at the  
code,
but haven't had time to look into the details yet)


> *Concepts*
> Each time you change an object state, the lucene index is updated and
> kept in sync. This is done through the Hibernate event system.

Ok - sounds cool. The index is updated at flush or commit time ? (i assume  
commit)

> Whether an entity is indexed or not and whether a property is indexed or
> not is defined through annotations.

Any defaults?

> You can also search through your domain model using Lucene and retrieve
> managed objects. The whole idea here is to do a nice integration between
> the search engine and the ORM without loosing the search engine power,
> hence most of the API remains. To sum up, query Lucene, get managed
> object back.

Cool.

> *Mapping*
> A given entity is mapped to an index. A lucene index is stored in a
> Directory, a Directory is a Lucnee abstract concept for index storage
> system. It can be a memory directory (RAMDirectory), a file system
> directory (FSDirectory) or any other kind of backend. Hibernate Lucene
> introduce the notion of DirectoryProvider that you can configure and
> define on a per entity basis (and wich is defaulted defaulted). The
> concept is very similar to ConnectionProvider.

defaulted defaulted ? (defaulted to RAMDirectory maybe ?)

> Lucene only works with Strings, so you can define a @FieldBridge which
> transform a java property into a Lucene Field (and potentially
> vice-versa). A more simple (useful?) version handle the transformation
> of a java property into a String.
> Some built-in FieldBrigde exists. @FieldBridge is very much like an
> Hibernate Type. Esp I introduced the notion of precision in dates (year,
> month, .. second, millisecond). This FieldBridge and StringBridge gives
> a lot of flexibility in the way to design the property indexing.

Sounds like a good thing.

> *Querying*
> I've introduced the notion of LuceneSession which implements Session and
> actually delegates to a regular Hibernate Session. This lucene session
> has a /createLuceneQuery()/ method and a /index()/ method.
>
> /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene query
> as a parameter and the list of targeted entities. Using a Lucene query
> as a parameter gives the full Lucene flexibility (no abstraction on top
> of it). An /org.hibernate.Query/ object is returned.
> You can (must) use pagination. A Lucene query also return the number of
> matching results (regardless of the pagination): query.resultSize() sort
> of count(*).

Is there any way to get to the underlying lucene result ?
As far as I remember Lucence also have some notion of result relevance and  
ordering
which could be relevant to reach ?

> Having the dynamic fetch profile would definitely be a killer pair
> (searching the lucene index, and fetching the appropriate object graph)

+1000 ;)

> /session.//index(Object)/ is currently not implemented it requires some
> modifications of SessionImpl or of LuceneSession. This feature is useful
> to initialize / refresh the index in a batch way (ie loading the data
> and applying the indexing process on this set of data).
> Basically the object is added to the index queue. At flush() time, the
> queue is processed.

hmm...why is this specific operation needed if it is done automatically
on object changes ?

And if it is something you want to allow users to index not-yet-indexed  
object
couldn't it be a flag or something on the LuceneQuery ?

e.g. s.createLuceneQuery("from X as x where x....").setIndex(true) or  
maybe .setIndex(IndexMode.ONLY_NEW);

> design considerations:
> The delegation vs subclassing strategy for LuceneSession (ie
> LuceneSession delegating to a regular Session allowing simple wrapping
> or the LuceneSessionImpl being a subclass of SessionImpl is an ongoing
> discussion.

> Using a subclassing model would allow the LuceneSession to keep
> operation queues (for batch indexing either through object changes or
> through session.index() ), but it does not allow a potential Hibernate -
> XXX integration on the same subclassing model. Batching is essential in
> Lucene for performance reasons.
> Using the delegation model requires some SessionImpl modifications to be
> able to keep track of a generic context. This context will keep the
> operation queues.
>
>
> *ToDo*
> Argue on the LuceneSession design are pick up one (Steve/Emmanuel/Feel
> free to join the danse)

I vote for a impl that will allow an existing Session to be the basis of  
extension;
thus not having Lucene integrating be a hardcoded subclass....we did the  
same
for Configuration and that is smelly/inflexible.

We should open enough of the session up to allow such delegation.

This might be extremely hard and close to impossible, but that is what I  
wish for ;)

> Find a way to keep the DocumentBuilder (sort of EntityPersister) at the
> SessionFactory level rather than the EventListener level (Steve/Emmanuel)

Finding a way of storing structured info/data relatively to some of the  
core concepts
in SF would be usefull for other things than Lucene integration (E.g.  
other search, query, tooling impls etc)

> Batch changes: to do that I need to be able to keep a session related
> queue of all insert/update changes. I can't in the current design
> because SessionImpl does not have such concept and because the
> LuceneSession is build on the delegation model. We need to discuss the
> strategy here (delegation vs subclassing)

Isn't there three strategies ?

The current one is LuceneSession delegates to Session

The other one is LuceneSession extends Session

The one I see as third is that LuceneSession delegates to Session, but on  
that Session we install callbacks so the LuceneSession (and friends) can  
maintain/participate in some of these state handling scenarioes ?

(maybe that is what you mean by delegation, but just wanted to be sure)

> Massive batch changes: in some system, we don't really bother with "real
> time" index synchronization, for those a common batch changes queue (ie
> across several sessions) would make sense with a queue flushing
> happening every hour  for example.

Isn't that something related/similar to having a very non-strict cache  
with a large timeout ?

> Clustered Directory: think about that. A JDBC Directory might not be the
> perfect solution actually.

Doesn't Lucene or some sister project provide clustering for Lucene yet ?

> implements additional strategies to load object on query.list()

what is this one ?

-- 
--
Max Rydahl Andersen
callto://max.rydahl.andersen

Hibernate
[EMAIL PROTECTED]
http://hibernate.org

JBoss Inc
[EMAIL PROTECTED]


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
hibernate-devel mailing list
hibernate-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hibernate-devel

Re: [Hibernate] Hibernate Lucene integration

Reply via email to