On 17 Apr 2013, at 13:23, Dan Berindei wrote:

> I like the idea of always clearing the state in members of the minority 
> partition(s), but one problem with that is that there may be some keys that 
> only had owners in the minority partition(s). If we wiped the state of the 
> minority partition members, those keys would be lost.
Indeed, data consistency is lost the moment we have a partition with numOwners 
members. So with the read-only cluster approach, we can only target *eventual* 
consistency - that's when the partitions are merged. 
> 
> Of course, you could argue that the cluster already lost those keys when we 
> allowed the majority partition to continue working without having those 
> keys... We could also rely on the topology information, and say that we only 
> support partitioning when numOwners >= numSites (or numRacks, if there is 
> only one site, or numMachines, if there is a single rack).
Good point re: topology. 
That assumes that there won't be any split brains in the same site (or rack), 
which I'm not sure stands true in general. Bela care to comment?
> 
> One other option is to perform a more complicated post-merge state transfer, 
> in which each partition sends all the data it has to all the other 
> partitions, and on the receiving end each node has a "conflict resolution" 
> component that can merge two values. That is definitely more complicated than 
> just going with a primary partition, though.
+1
> 
> One final point... when a node comes back online and it has a local cache 
> store, it is very much as if we had a merge view. The current approach is to 
> join as if the node didn't have any data, then delete everything from the 
> cache store that is not mapped to the node in the consistent hash.
With this approach a value that has been deleted within the cluster might 
resurrect. Wouldn't it be  better to delete everything from the cache store?
> Obviously that can lead to consistency problems, just like our current merge 
> algorithm. It would be nice if we could handle both these cases the same way.
+1. The cache store is the equivalent of the read only partition.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)





_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to