Doug, I'm looking forward to the collaboration!
> My current thinking is that the Solr search API is the appropriate > model. Solr's facets are an important feature that require low-level I'm thinking, can we make the type of shard updater/searcher and result merger configurable in a general distributed index system? Vanilla Lucene is one type. Solr is another. Nutch could have one. Applications can write their customized type (must be Lucene-based). In case of a Solr-typed system, for example, an application sends a search request to an index client. The index client sends the search request to shard servers which host Solr searchers. The index client uses the Solr result merger to merge the results from all the shards and returns the merged result to the application. > My primary difference with your proposal is that I would like to support > online indexing. Documents could be inserted and removed directly, and > shards would synchronize changes amongst replicas, with an "eventual > consistency" model. I've been thinking about batch update vs. online update. :) Is it possible to support both efficiently in one system? We may say that a system which supports online update can handle batch update. However, it depends on whether the updates on a shard server are lost when the server goes down. In a system targeting batch update, the entirety of a batch update can simply be guaranteed by a map/reduce job. Your thoughts? The online update you described here is different from the one you described in the Index Server Project proposal a while ago. It was multi-reader single-writer before. Now it's multi-reader multi-writer with eventual consistency. Is it because it is a more general usage scenario that you think that latter supports? Regards, Ning
