Re: [Hibernate] Hibernate Lucene integration

Emmanuel Bernard Sun, 16 Jul 2006 15:00:08 -0700

Hi,

Changes are propagated right after commit time. session.index() is a different beast, but the usecase is different: index() will be applied at flush time so that you can flush() and clear()

"Any Default"
Good question, I don't think we should index all properties by default: I guess we should ask that tot he Lucene community. As for the bridges (ie types), they are defaulted, there is an heuristic guess mechanism.

The default is FSDirectory with the base directory being ".". RAMDirectory is of little use except for some specific usecases and for unit testing

I'm not a big fan of exposing the Lucene result itself but the relevance is something useful, I need to thing about that: the main problem is that I currently hide some of the plumbering to the user esp the searcher opening and closing, by doing so, there is no way to give the Hits (Lucene results).
The ordering is preserved when returned by Hibernate.

"why session.index() a specific operation?"
Here is my reasoning:
- using a lucene query to index a non index object is going to be hard since the lucene query will not return the object in the first place ;-)
- using a regular Hibernate query + a flag to index the objects suffers the OOME issue unless we use the stateless session. If I use the stateless session, I can't use the event system...
- From what I've seen and guessed, what you want to (re)index is very business specific and can be way more complex than just a query

"Session delegation and callbacks"
Yes but Event Listeners are the current way to have a callback to the session. Event Listeners are stateless, the state being part of the events.
What we need is a way to push / keep some informations at the event / PersistenceContext level. The SessionDelegate would be another way to keep some state but make it hard to push the info to the eventlisteners

"massive update == very non-strict rw strategy"
could be but that's not the main problem. The main problem is to keep somewhere the changes to apply even if the VM crash.

"implements additional strategies to load object on query.list()"
Currently I do
for all result session.load
for all result session.get
That way I benefit from the batch-size

Some other solutions would be to use a HQL query using a IN clause containing the list of id to load

Max Rydahl Andersen wrote:

Re: [Hibernate] Hibernate Lucene integration
Hi Emmanuel,

Here are my comments (sorry if something is obvious from looking at the
code,
but haven't had time to look into the details yet)

> *Concepts*
> Each time you change an object state, the lucene index is updated and
> kept in sync. This is done through the Hibernate event system.

Ok - sounds cool. The index is updated at flush or commit time ? (i assume
commit)

> Whether an entity is indexed or not and whether a property is indexed or
> not is defined through annotations.

Any defaults?

> You can also search through your domain model using Lucene and retrieve
> managed objects. The whole idea here is to do a nice integration between
> the search engine and the ORM without loosing the search engine power,
> hence most of the API remains. To sum up, query Lucene, get managed
> object back.

Cool.

> *Mapping*
> A given entity is mapped to an index. A lucene index is stored in a
> Directory, a Directory is a Lucnee abstract concept for index storage
> system. It can be a memory directory (RAMDirectory), a file system
> directory (FSDirectory) or any other kind of backend. Hibernate Lucene
> introduce the notion of DirectoryProvider that you can configure and
> define on a per entity basis (and wich is defaulted defaulted). The
> concept is very similar to ConnectionProvider.

defaulted defaulted ? (defaulted to RAMDirectory maybe ?)

> Lucene only works with Strings, so you can define a @FieldBridge which
> transform a java property into a Lucene Field (and potentially
> vice-versa). A more simple (useful?) version handle the transformation
> of a java property into a String.
> Some built-in FieldBrigde exists. @FieldBridge is very much like an
> Hibernate Type. Esp I introduced the notion of precision in dates (year,
> month, .. second, millisecond). This FieldBridge and StringBridge gives
> a lot of flexibility in the way to design the property indexing.

Sounds like a good thing.

> *Querying*
> I've introduced the notion of LuceneSession which implements Session and
> actually delegates to a regular Hibernate Session. This lucene session
> has a /createLuceneQuery()/ method and a /index()/ method.
>
> /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene query
> as a parameter and the list of targeted entities. Using a Lucene query
> as a parameter gives the full Lucene flexibility (no abstraction on top
> of it). An /org.hibernate.Query/ object is returned.
> You can (must) use pagination. A Lucene query also return the number of
> matching results (regardless of the pagination): query.resultSize() sort
> of count(*).

Is there any way to get to the underlying lucene result ?
As far as I remember Lucence also have some notion of result relevance and
ordering
which could be relevant to reach ?

> Having the dynamic fetch profile would definitely be a killer pair
> (searching the lucene index, and fetching the appropriate object graph)

+1000 ;)

> /session.//index(Object)/ is currently not implemented it requires some
> modifications of SessionImpl or of LuceneSession. This feature is useful
> to initialize / refresh the index in a batch way (ie loading the data
> and applying the indexing process on this set of data).
> Basically the object is added to the index queue. At flush() time, the
> queue is processed.

hmm...why is this specific operation needed if it is done automatically
on object changes ?

And if it is something you want to allow users to index not-yet-indexed
object
couldn't it be a flag or something on the LuceneQuery ?

e.g. s.createLuceneQuery("from X as x where x....").setIndex(true) or
maybe .setIndex(IndexMode.ONLY_NEW);

> design considerations:
> The delegation vs subclassing strategy for LuceneSession (ie
> LuceneSession delegating to a regular Session allowing simple wrapping
> or the LuceneSessionImpl being a subclass of SessionImpl is an ongoing
> discussion.

> Using a subclassing model would allow the LuceneSession to keep
> operation queues (for batch indexing either through object changes or
> through session.index() ), but it does not allow a potential Hibernate -
> XXX integration on the same subclassing model. Batching is essential in
> Lucene for performance reasons.
> Using the delegation model requires some SessionImpl modifications to be
> able to keep track of a generic context. This context will keep the
> operation queues.
>
>
> *ToDo*
> Argue on the LuceneSession design are pick up one (Steve/Emmanuel/Feel
> free to join the danse)

I vote for a impl that will allow an existing Session to be the basis of
extension;
thus not having Lucene integrating be a hardcoded subclass....we did the
same
for Configuration and that is smelly/inflexible.

We should open enough of the session up to allow such delegation.

This might be extremely hard and close to impossible, but that is what I
wish for ;)

> Find a way to keep the DocumentBuilder (sort of EntityPersister) at the
> SessionFactory level rather than the EventListener level (Steve/Emmanuel)

Finding a way of storing structured info/data relatively to some of the
core concepts
in SF would be usefull for other things than Lucene integration (E.g.
other search, query, tooling impls etc)

> Batch changes: to do that I need to be able to keep a session related
> queue of all insert/update changes. I can't in the current design
> because SessionImpl does not have such concept and because the
> LuceneSession is build on the delegation model. We need to discuss the
> strategy here (delegation vs subclassing)

Isn't there three strategies ?

The current one is LuceneSession delegates to Session

The other one is LuceneSession extends Session

The one I see as third is that LuceneSession delegates to Session, but on
that Session we install callbacks so the LuceneSession (and friends) can
maintain/participate in some of these state handling scenarioes ?

(maybe that is what you mean by delegation, but just wanted to be sure)

> Massive batch changes: in some system, we don't really bother with "real
> time" index synchronization, for those a common batch changes queue (ie
> across several sessions) would make sense with a queue flushing
> happening every hour for example.

Isn't that something related/similar to having a very non-strict cache
with a large timeout ?

> Clustered Directory: think about that. A JDBC Directory might not be the
> perfect solution actually.

Doesn't Lucene or some sister project provide clustering for Lucene yet ?

> implements additional strategies to load object on query.list()

what is this one ?

--
--
Max Rydahl Andersen
callto://max.rydahl.andersen

Hibernate
[EMAIL PROTECTED]
http://hibernate.org

JBoss Inc
[EMAIL PROTECTED]

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
hibernate-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/hibernate-devel

Re: [Hibernate] Hibernate Lucene integration

Reply via email to