On 10/18/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
We assume that, within an index, a file with a given name is written
only once.

Is this necessary, and will we need the lockless patch (that avoids
renaming or rewriting *any* files), or is Lucene's current index
behavior sufficient?

I like the explicit index version and keeping the last few version
around.  The whole idea of a master seems to lessen the amount of
manual configuration in large clusters too.
The search side seems straightforward enough, but I haven't totally
figured out how the update side should work.

Deletions could be broadcast to all slaves.  That would probably be fast
enough.

Hmmm, that does allow one to move documents around the cluster and
more easily resize things.

One potental problem is a document overwrite implemented as a delete
then an add.
More than one client doing this for the same document could result in
0 or 2 documents, instead of 1.  I guess clients will just need to be
relatively coordinated in their activities.

 Alternately, indexes could be partitioned by a hash of each
document's unique id, permitting deletions to be routed to the
appropriate slave.

A hash is nice, but then you can't resize the number of partitions
your index is split into.

It's unfortunate the master needs to be involved on every document add.
If deletes were broadcast, and documents could go to any partition,
that would be one way around it (with the downside of a less powerful
master that could implement certain distribution policies).
Another way to lessen the master-in-the-middle cost is to make sure
one can aggregate small requests:
   IndexLocation[] getUpdateableIndex(String[] id);

We might consider a delete() on the master interface too.  That way it could
 3) hide the delete policy (broadcast or directl-to-server-that-has-doc)
2) potentially do some batching of deletes
1) simply do the delete locally if there is a single index partition
and this is a combination master/searcher

It seems like the master might want to be involved in commits too, or
maybe we just rely on the slave to master heartbeat to kick of
immediately after a commit so that index replication can be initiated?

Does this make sense?  Does it sound like it would be useful to Solr?
To Nutch?  To others?  Who would be interested and able to work on it?

Still interested, and able :-)

-Yonik

Reply via email to