Re: Lucene-based Distributed Index Leveraging Hadoop

Ning Li Thu, 07 Feb 2008 14:35:03 -0800

Doug,

I'm looking forward to the collaboration!


> My current thinking is that the Solr search API is the appropriate
> model.  Solr's facets are an important feature that require low-level

I'm thinking, can we make the type of shard updater/searcher and
result merger configurable in a general distributed index system?
Vanilla Lucene is one type. Solr is another. Nutch could have one.
Applications can write their customized type (must be Lucene-based).
In case of a Solr-typed system, for example, an application sends
a search request to an index client. The index client sends the search
request to shard servers which host Solr searchers. The index client
uses the Solr result merger to merge the results from all the shards
and returns the merged result to the application.

> My primary difference with your proposal is that I would like to support
> online indexing.  Documents could be inserted and removed directly, and
> shards would synchronize changes amongst replicas, with an "eventual
> consistency" model.

I've been thinking about batch update vs. online update. :)
Is it possible to support both efficiently in one system?

We may say that a system which supports online update can
handle batch update. However, it depends on whether the updates
on a shard server are lost when the server goes down. In a
system targeting batch update, the entirety of a batch update
can simply be guaranteed by a map/reduce job.

Your thoughts?

The online update you described here is different from the one
you described in the Index Server Project proposal a while ago.
It was multi-reader single-writer before. Now it's multi-reader
multi-writer with eventual consistency. Is it because it is a more
general usage scenario that you think that latter supports?

Regards,
Ning

Re: Lucene-based Distributed Index Leveraging Hadoop

Reply via email to