I had a quick look at SOLR and DBSight. They seem to achieve a different goal than Hibernate Lucene. The formers belong to the project box category: you set up a server that will handle the search for you. The application will then delegate the work to those servers. The latter belongs to the framework category: you use it inside your Hibernate/EJB 3.0 application to enable an index based search feature. To a certain extend, it is the same difference between a Google box and Lucene.

You can write some code based on the latter to covers the formers features esp the platform abstraction (PHP, .net), but it is probably a lot of work and that is not really the point. You can write some code based on the formers to enable indexing and search of your persistent domain model (persisted through Hibernate), but that is probably more work.

Really it is a matter of easing the pain from one side of the problem or the other side. I don't see much competition between the 2 approaches, they cover different goals.

To specifically answer some of your remarks:
- yes, you need to write some code to recreate an index. Literally, 6 lines of code. - no, I do not currently cache the searcher because, Hibernate is transactional by nature and protect yourself as much as possible from read uncommited and other data inconsistencies. I guess I could implement some caching capabilities using reader.isCurrent() or something equivalent. - the ability to split searchers servers from indexers servers is on my todo list.

Cheers

Emmanuel


Chris Lu wrote:
I personally like your effort, but technically I would  disagree.

The SOLR project, and the project I am working on, DBSight, have an
detached approach which is implementation agnostic, no matter if it's
java, ruby, php, .net. The return results can be a rendered HTML,
JSON, XML. I don't think you can be more flexible than that. If
creating an new index takes 5 minutes without any coding, you can
create something more creative.

From business side, you don't need to worry about indexing when
designing a system. New requirement may come. It's very hard trying to
anticipate all the needs.

Technically, detached approach gives more flexible on resources like
CPU, memory, hard drive. For example, if your index grows large, say
1G, indexing can take hours with merging, I am not sure how compass or
hibernate/lucene handles it. Need to re-write code at that time? I
actually feel it's a dangerous trap.

I've introduced a session.index() which forces the (re)indexing of the
document
So does it mean you need to write some code to fix the index if it's crashed?

What do you mean by multithread safe? The indexing?
the indexing is multithread safe in the Hibernate Lucene integration
The indexing can be threadsafe. But will it affect the searching? With
many files changing and merging, if you cache the searcher. the
searching will have "read passed EOF" exceptions. If you don't cache
the searcher, you will loose the built-in caching, FieldCacheImpl, in
Lucene.


The query process?
the query doesn't have to since you query on a give session (aka user
conversation), so no multithread threat here.
So you are not caching searcher.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to