[ No longer cross-posting to java-dev and solr-user. ]

Andrzej Bialecki wrote:
A particular client should be able to provide a consistent read/write view by bonding to particular replicas of a shard. Thus a user who makes a modification should be able to generally see that modification in results immediately, while other users, talking to different replicas, may not see it until synchronization is complete.

This requires that we use versioning, and that we have a "shard manager" that knows the latest versions of each shard among the whole active set - or that clients discover this dynamically by querying the shard servers every now and then.

Yes, there needs to be a master that knows the shard hash function. However, I'm not sure what you mean by "versioning". In general, there is no "latest" version of a shard. Different shards have had different updates, and must, between themselves, resolve conflicts. A client would generally talk to just one replica of each shard. This is like CouchDB. If different fields of a document are modified on different shards, then the changes can be merged. Edits to a text field might sometimes even be mergable. But, in general, if two shards both contain unmergable changes to the same field, one will win and one will lose. Similarly, if a document id is deleted in one shard and added in another at approximately the same time, then the addition would generally win. Thus if a single client switches which shard replica it talks to, then it could possibly lose deletions. Or if different clients attempt to modify the same document, one clients changes may be overwritten by the other. This is similar to the way that Amazon's Dynamo works: in the case of failures, shopping cart deletions can be lost, and deleted things may thus re-appear in one's shopping cart. This happens rarely, and confirmation is required before final sale, so it is not a big problem. Perhaps conflicts can be flagged and manually resolved by the application. Or perhaps clocks can be sufficiently synchronized that the vast majority of conflicts can be automatically resolved correctly.

Doug

Reply via email to