Hi,
Changes are propagated right after commit time. session.index() is a
different beast, but the usecase is different: index() will be applied
at flush time so that you can flush() and clear()
"Any Default"
Good question, I don't think we should index all properties by default:
I guess we should ask that tot he Lucene community. As for the bridges
(ie types), they are defaulted, there is an heuristic guess mechanism.
The default is FSDirectory with the base directory being ".".
RAMDirectory is of little use except for some specific usecases and for
unit testing
I'm not a big fan of exposing the Lucene result itself but the
relevance is something useful, I need to thing about that: the main
problem is that I currently hide some of the plumbering to the user esp
the searcher opening and closing, by doing so, there is no way to give
the Hits (Lucene results).
The ordering is preserved when returned by Hibernate.
"why session.index() a specific operation?"
Here is my reasoning:
- using a lucene query to index a non index object is going to be hard
since the lucene query will not return the object in the first place ;-)
- using a regular Hibernate query + a flag to index the objects
suffers the OOME issue unless we use the stateless session. If I use
the stateless session, I can't use the event system...
- From what I've seen and guessed, what you want to (re)index is very
business specific and can be way more complex than just a query
"Session delegation and callbacks"
Yes but Event Listeners are the current way to have a callback to the
session. Event Listeners are stateless, the state being part of the
events.
What we need is a way to push / keep some informations at the event /
PersistenceContext level. The SessionDelegate would be another way to
keep some state but make it hard to push the info to the eventlisteners
"massive update == very non-strict rw strategy"
could be but that's not the main problem. The main problem is to keep
somewhere the changes to apply even if the VM crash.
"implements additional strategies to load object on
query.list()"
Currently I do
for all result session.load
for all result session.get
That way I benefit from the batch-size
Some other solutions would be to use a HQL query using a IN clause
containing the list of id to load
Max Rydahl Andersen wrote:
Re: [Hibernate] Hibernate Lucene integration
Hi Emmanuel,
Here are my comments (sorry if something is obvious from looking at the
code,
but haven't had time to look into the details yet)
> *Concepts*
> Each time you change an object state, the lucene index is updated
and
> kept in sync. This is done through the Hibernate event system.
Ok - sounds cool. The index is updated at flush or commit time ? (i
assume
commit)
> Whether an entity is indexed or not and whether a property is
indexed or
> not is defined through annotations.
Any defaults?
> You can also search through your domain model using Lucene and
retrieve
> managed objects. The whole idea here is to do a nice integration
between
> the search engine and the ORM without loosing the search engine
power,
> hence most of the API remains. To sum up, query Lucene, get managed
> object back.
Cool.
> *Mapping*
> A given entity is mapped to an index. A lucene index is stored in a
> Directory, a Directory is a Lucnee abstract concept for index
storage
> system. It can be a memory directory (RAMDirectory), a file system
> directory (FSDirectory) or any other kind of backend. Hibernate
Lucene
> introduce the notion of DirectoryProvider that you can configure
and
> define on a per entity basis (and wich is defaulted defaulted). The
> concept is very similar to ConnectionProvider.
defaulted defaulted ? (defaulted to RAMDirectory maybe ?)
> Lucene only works with Strings, so you can define a @FieldBridge
which
> transform a java property into a Lucene Field (and potentially
> vice-versa). A more simple (useful?) version handle the
transformation
> of a java property into a String.
> Some built-in FieldBrigde exists. @FieldBridge is very much like an
> Hibernate Type. Esp I introduced the notion of precision in dates
(year,
> month, .. second, millisecond). This FieldBridge and StringBridge
gives
> a lot of flexibility in the way to design the property indexing.
Sounds like a good thing.
> *Querying*
> I've introduced the notion of LuceneSession which implements
Session and
> actually delegates to a regular Hibernate Session. This lucene
session
> has a /createLuceneQuery()/ method and a /index()/ method.
>
> /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene
query
> as a parameter and the list of targeted entities. Using a Lucene
query
> as a parameter gives the full Lucene flexibility (no abstraction
on top
> of it). An /org.hibernate.Query/ object is returned.
> You can (must) use pagination. A Lucene query also return the
number of
> matching results (regardless of the pagination):
query.resultSize() sort
> of count(*).
Is there any way to get to the underlying lucene result ?
As far as I remember Lucence also have some notion of result relevance
and
ordering
which could be relevant to reach ?
> Having the dynamic fetch profile would definitely be a killer pair
> (searching the lucene index, and fetching the appropriate object
graph)
+1000 ;)
> /session.//index(Object)/ is currently not implemented it requires
some
> modifications of SessionImpl or of LuceneSession. This feature is
useful
> to initialize / refresh the index in a batch way (ie loading the
data
> and applying the indexing process on this set of data).
> Basically the object is added to the index queue. At flush() time,
the
> queue is processed.
hmm...why is this specific operation needed if it is done automatically
on object changes ?
And if it is something you want to allow users to index not-yet-indexed
object
couldn't it be a flag or something on the LuceneQuery ?
e.g. s.createLuceneQuery("from X as x where x....").setIndex(true) or
maybe .setIndex(IndexMode.ONLY_NEW);
> design considerations:
> The delegation vs subclassing strategy for LuceneSession (ie
> LuceneSession delegating to a regular Session allowing simple
wrapping
> or the LuceneSessionImpl being a subclass of SessionImpl is an
ongoing
> discussion.
> Using a subclassing model would allow the LuceneSession to keep
> operation queues (for batch indexing either through object changes
or
> through session.index() ), but it does not allow a potential
Hibernate -
> XXX integration on the same subclassing model. Batching is
essential in
> Lucene for performance reasons.
> Using the delegation model requires some SessionImpl modifications
to be
> able to keep track of a generic context. This context will keep the
> operation queues.
>
>
> *ToDo*
> Argue on the LuceneSession design are pick up one
(Steve/Emmanuel/Feel
> free to join the danse)
I vote for a impl that will allow an existing Session to be the basis
of
extension;
thus not having Lucene integrating be a hardcoded subclass....we did
the
same
for Configuration and that is smelly/inflexible.
We should open enough of the session up to allow such delegation.
This might be extremely hard and close to impossible, but that is what
I
wish for ;)
> Find a way to keep the DocumentBuilder (sort of EntityPersister)
at the
> SessionFactory level rather than the EventListener level
(Steve/Emmanuel)
Finding a way of storing structured info/data relatively to some of the
core concepts
in SF would be usefull for other things than Lucene integration (E.g.
other search, query, tooling impls etc)
> Batch changes: to do that I need to be able to keep a session
related
> queue of all insert/update changes. I can't in the current design
> because SessionImpl does not have such concept and because the
> LuceneSession is build on the delegation model. We need to discuss
the
> strategy here (delegation vs subclassing)
Isn't there three strategies ?
The current one is LuceneSession delegates to Session
The other one is LuceneSession extends Session
The one I see as third is that LuceneSession delegates to Session, but
on
that Session we install callbacks so the LuceneSession (and friends)
can
maintain/participate in some of these state handling scenarioes ?
(maybe that is what you mean by delegation, but just wanted to be sure)
> Massive batch changes: in some system, we don't really bother with
"real
> time" index synchronization, for those a common batch changes
queue (ie
> across several sessions) would make sense with a queue flushing
> happening every hour for example.
Isn't that something related/similar to having a very non-strict cache
with a large timeout ?
> Clustered Directory: think about that. A JDBC Directory might not
be the
> perfect solution actually.
Doesn't Lucene or some sister project provide clustering for Lucene yet
?
> implements additional strategies to load object on query.list()
what is this one ?
--
--
Max Rydahl Andersen
callto://max.rydahl.andersen
Hibernate
[EMAIL PROTECTED]
http://hibernate.org
JBoss Inc
[EMAIL PROTECTED]
|
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
hibernate-devel mailing list
hibernate-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hibernate-devel