[ No longer cross-posting to java-dev and solr-user. ]
Andrzej Bialecki wrote:
A particular client should be able to provide a consistent read/write
view by bonding to particular replicas of a shard. Thus a user who
makes a modification should be able to generally see that modification
in results immediately, while other users, talking to different
replicas, may not see it until synchronization is complete.
This requires that we use versioning, and that we have a "shard manager"
that knows the latest versions of each shard among the whole active set
- or that clients discover this dynamically by querying the shard
servers every now and then.
Yes, there needs to be a master that knows the shard hash function.
However, I'm not sure what you mean by "versioning". In general, there
is no "latest" version of a shard. Different shards have had different
updates, and must, between themselves, resolve conflicts. A client
would generally talk to just one replica of each shard. This is like
CouchDB. If different fields of a document are modified on different
shards, then the changes can be merged. Edits to a text field might
sometimes even be mergable. But, in general, if two shards both contain
unmergable changes to the same field, one will win and one will lose.
Similarly, if a document id is deleted in one shard and added in another
at approximately the same time, then the addition would generally win.
Thus if a single client switches which shard replica it talks to, then
it could possibly lose deletions. Or if different clients attempt to
modify the same document, one clients changes may be overwritten by the
other. This is similar to the way that Amazon's Dynamo works: in the
case of failures, shopping cart deletions can be lost, and deleted
things may thus re-appear in one's shopping cart. This happens rarely,
and confirmation is required before final sale, so it is not a big
problem. Perhaps conflicts can be flagged and manually resolved by the
application. Or perhaps clocks can be sufficiently synchronized that
the vast majority of conflicts can be automatically resolved correctly.
Doug