Re: [PROPOSAL] index server project

Bob Carpenter Tue, 21 Nov 2006 19:29:31 -0800

Doug Cutting wrote:

It seems that Nutch and Solr would benefit from a shared index servinginfrastructure.

> ...

An RPC mechanism would be used to communicate between nodes (probablyHadoop's). The system would be configured with a single master nodethat keeps track of where indexes are located, and a number of slavenodes that would maintain, search and replicate indexes. Clients wouldtalk to the master to find out which indexes to search or update, thenthey'll talk directly to slaves to perform searches and updates.
...
Does this make sense? Does it sound like it would be useful to Solr? ToNutch? To others? Who would be interested and able to work on it?


Is there any way this could be generalized so that resources
other than Lucene indexes could be packaged up and distributed?


The reason I ask is that we have customers who are using
Lucene and SOLR and would like to pass other bits of their
applications around in the same way, including things we've
built from indexed data like spelling checkers, background
models for statistically interesting phrase detectors, statistical
models for topic/tag classifiers that get retrained as users
add more tags, language identifiers, etc.

From what I understand of Doug's proposal as well as
what I've seen in SOLR, there's not much that's actually
Lucene-specific about all this client/master/slave synching
other than that the data's a Lucene index.

I imagine this could be done with a generalization of the
kinds of callbacks found in SOLR, or by making what gets
passed around configurable in the proposed index server
project.

I'd be happy to test and help with API-level design/doc;  I
don't know much about distribution mechanics, though, which
is why I'm so interested in this high level abstraction.

- Bob Carpenter
  Alias-i

Re: [PROPOSAL] index server project

Reply via email to