On 17 Apr 2013, at 13:23, Dan Berindei wrote: > I like the idea of always clearing the state in members of the minority > partition(s), but one problem with that is that there may be some keys that > only had owners in the minority partition(s). If we wiped the state of the > minority partition members, those keys would be lost. Indeed, data consistency is lost the moment we have a partition with numOwners members. So with the read-only cluster approach, we can only target *eventual* consistency - that's when the partitions are merged. > > Of course, you could argue that the cluster already lost those keys when we > allowed the majority partition to continue working without having those > keys... We could also rely on the topology information, and say that we only > support partitioning when numOwners >= numSites (or numRacks, if there is > only one site, or numMachines, if there is a single rack). Good point re: topology. That assumes that there won't be any split brains in the same site (or rack), which I'm not sure stands true in general. Bela care to comment? > > One other option is to perform a more complicated post-merge state transfer, > in which each partition sends all the data it has to all the other > partitions, and on the receiving end each node has a "conflict resolution" > component that can merge two values. That is definitely more complicated than > just going with a primary partition, though. +1 > > One final point... when a node comes back online and it has a local cache > store, it is very much as if we had a merge view. The current approach is to > join as if the node didn't have any data, then delete everything from the > cache store that is not mapped to the node in the consistent hash. With this approach a value that has been deleted within the cluster might resurrect. Wouldn't it be better to delete everything from the cache store? > Obviously that can lead to consistency problems, just like our current merge > algorithm. It would be nice if we could handle both these cases the same way. +1. The cache store is the equivalent of the read only partition.
Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
