Hi Emmanuel, Here are my comments (sorry if something is obvious from looking at the code, but haven't had time to look into the details yet)
> *Concepts* > Each time you change an object state, the lucene index is updated and > kept in sync. This is done through the Hibernate event system. Ok - sounds cool. The index is updated at flush or commit time ? (i assume commit) > Whether an entity is indexed or not and whether a property is indexed or > not is defined through annotations. Any defaults? > You can also search through your domain model using Lucene and retrieve > managed objects. The whole idea here is to do a nice integration between > the search engine and the ORM without loosing the search engine power, > hence most of the API remains. To sum up, query Lucene, get managed > object back. Cool. > *Mapping* > A given entity is mapped to an index. A lucene index is stored in a > Directory, a Directory is a Lucnee abstract concept for index storage > system. It can be a memory directory (RAMDirectory), a file system > directory (FSDirectory) or any other kind of backend. Hibernate Lucene > introduce the notion of DirectoryProvider that you can configure and > define on a per entity basis (and wich is defaulted defaulted). The > concept is very similar to ConnectionProvider. defaulted defaulted ? (defaulted to RAMDirectory maybe ?) > Lucene only works with Strings, so you can define a @FieldBridge which > transform a java property into a Lucene Field (and potentially > vice-versa). A more simple (useful?) version handle the transformation > of a java property into a String. > Some built-in FieldBrigde exists. @FieldBridge is very much like an > Hibernate Type. Esp I introduced the notion of precision in dates (year, > month, .. second, millisecond). This FieldBridge and StringBridge gives > a lot of flexibility in the way to design the property indexing. Sounds like a good thing. > *Querying* > I've introduced the notion of LuceneSession which implements Session and > actually delegates to a regular Hibernate Session. This lucene session > has a /createLuceneQuery()/ method and a /index()/ method. > > /session.createLuceneQuery(lucene.Query, Class[])/ takes a Lucene query > as a parameter and the list of targeted entities. Using a Lucene query > as a parameter gives the full Lucene flexibility (no abstraction on top > of it). An /org.hibernate.Query/ object is returned. > You can (must) use pagination. A Lucene query also return the number of > matching results (regardless of the pagination): query.resultSize() sort > of count(*). Is there any way to get to the underlying lucene result ? As far as I remember Lucence also have some notion of result relevance and ordering which could be relevant to reach ? > Having the dynamic fetch profile would definitely be a killer pair > (searching the lucene index, and fetching the appropriate object graph) +1000 ;) > /session.//index(Object)/ is currently not implemented it requires some > modifications of SessionImpl or of LuceneSession. This feature is useful > to initialize / refresh the index in a batch way (ie loading the data > and applying the indexing process on this set of data). > Basically the object is added to the index queue. At flush() time, the > queue is processed. hmm...why is this specific operation needed if it is done automatically on object changes ? And if it is something you want to allow users to index not-yet-indexed object couldn't it be a flag or something on the LuceneQuery ? e.g. s.createLuceneQuery("from X as x where x....").setIndex(true) or maybe .setIndex(IndexMode.ONLY_NEW); > design considerations: > The delegation vs subclassing strategy for LuceneSession (ie > LuceneSession delegating to a regular Session allowing simple wrapping > or the LuceneSessionImpl being a subclass of SessionImpl is an ongoing > discussion. > Using a subclassing model would allow the LuceneSession to keep > operation queues (for batch indexing either through object changes or > through session.index() ), but it does not allow a potential Hibernate - > XXX integration on the same subclassing model. Batching is essential in > Lucene for performance reasons. > Using the delegation model requires some SessionImpl modifications to be > able to keep track of a generic context. This context will keep the > operation queues. > > > *ToDo* > Argue on the LuceneSession design are pick up one (Steve/Emmanuel/Feel > free to join the danse) I vote for a impl that will allow an existing Session to be the basis of extension; thus not having Lucene integrating be a hardcoded subclass....we did the same for Configuration and that is smelly/inflexible. We should open enough of the session up to allow such delegation. This might be extremely hard and close to impossible, but that is what I wish for ;) > Find a way to keep the DocumentBuilder (sort of EntityPersister) at the > SessionFactory level rather than the EventListener level (Steve/Emmanuel) Finding a way of storing structured info/data relatively to some of the core concepts in SF would be usefull for other things than Lucene integration (E.g. other search, query, tooling impls etc) > Batch changes: to do that I need to be able to keep a session related > queue of all insert/update changes. I can't in the current design > because SessionImpl does not have such concept and because the > LuceneSession is build on the delegation model. We need to discuss the > strategy here (delegation vs subclassing) Isn't there three strategies ? The current one is LuceneSession delegates to Session The other one is LuceneSession extends Session The one I see as third is that LuceneSession delegates to Session, but on that Session we install callbacks so the LuceneSession (and friends) can maintain/participate in some of these state handling scenarioes ? (maybe that is what you mean by delegation, but just wanted to be sure) > Massive batch changes: in some system, we don't really bother with "real > time" index synchronization, for those a common batch changes queue (ie > across several sessions) would make sense with a queue flushing > happening every hour for example. Isn't that something related/similar to having a very non-strict cache with a large timeout ? > Clustered Directory: think about that. A JDBC Directory might not be the > perfect solution actually. Doesn't Lucene or some sister project provide clustering for Lucene yet ? > implements additional strategies to load object on query.list() what is this one ? -- -- Max Rydahl Andersen callto://max.rydahl.andersen Hibernate [EMAIL PROTECTED] http://hibernate.org JBoss Inc [EMAIL PROTECTED] ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ hibernate-devel mailing list hibernate-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/hibernate-devel