Re: Modelling relational data in Lucene Index?

Emmanuel Bernard Mon, 06 Nov 2006 07:14:02 -0800

I had a quick look at SOLR and DBSight. They seem to achieve a differentgoal than Hibernate Lucene.The formers belong to the project box category: you set up a server thatwill handle the search for you. The application will then delegate thework to those servers.The latter belongs to the framework category: you use it inside yourHibernate/EJB 3.0 application to enable an index based search feature.To a certain extend, it is the same difference between a Google box andLucene.

You can write some code based on the latter to covers the formersfeatures esp the platform abstraction (PHP, .net), but it is probably alot of work and that is not really the point.You can write some code based on the formers to enable indexing andsearch of your persistent domain model (persisted through Hibernate),but that is probably more work.

Really it is a matter of easing the pain from one side of the problem orthe other side. I don't see much competition between the 2 approaches,they cover different goals.


To specifically answer some of your remarks:

- yes, you need to write some code to recreate an index. Literally, 6lines of code.- no, I do not currently cache the searcher because, Hibernate istransactional by nature and protect yourself as much as possible fromread uncommited and other data inconsistencies. I guess I couldimplement some caching capabilities using reader.isCurrent() orsomething equivalent.- the ability to split searchers servers from indexers servers is on mytodo list.


Cheers

Emmanuel


Chris Lu wrote:

I personally like your effort, but technically I would  disagree.

The SOLR project, and the project I am working on, DBSight, have an
detached approach which is implementation agnostic, no matter if it's
java, ruby, php, .net. The return results can be a rendered HTML,
JSON, XML. I don't think you can be more flexible than that. If
creating an new index takes 5 minutes without any coding, you can
create something more creative.

From business side, you don't need to worry about indexing when

designing a system. New requirement may come. It's very hard trying to
anticipate all the needs.

Technically, detached approach gives more flexible on resources like
CPU, memory, hard drive. For example, if your index grows large, say
1G, indexing can take hours with merging, I am not sure how compass or
hibernate/lucene handles it. Need to re-write code at that time? I
actually feel it's a dangerous trap.

I've introduced a session.index() which forces the (re)indexing of the
document

So does it mean you need to write some code to fix the index if it'scrashed?

What do you mean by multithread safe? The indexing?
the indexing is multithread safe in the Hibernate Lucene integration

The indexing can be threadsafe. But will it affect the searching? With
many files changing and merging, if you cache the searcher. the
searching will have "read passed EOF" exceptions. If you don't cache
the searcher, you will loose the built-in caching, FieldCacheImpl, in
Lucene.


The query process?
the query doesn't have to since you query on a give session (aka user
conversation), so no multithread threat here.

So you are not caching searcher.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Modelling relational data in Lucene Index?

Reply via email to