On 17 Apr 2013, at 16:53, Manik Surtani wrote: >> Of course, you could argue that the cluster already lost those keys when we >> allowed the majority partition to continue working without having those >> keys... We could also rely on the topology information, and say that we only >> support partitioning when numOwners >= numSites (or numRacks, if there is >> only one site, or numMachines, if there is a single rack). > > This is only true for an embedded app. For an app communicating with the > cluster over Hot Rod, this isn't the case as it could directly read from the > minority partition.
I don't think it's that simple: - k1 is in the minority partition *only* - it is removed/updated in the majority partition - then read in the minority partition > >> One other option is to perform a more complicated post-merge state transfer, >> in which each partition sends all the data it has to all the other >> partitions, and on the receiving end each node has a "conflict resolution" >> component that can merge two values. That is definitely more complicated >> than just going with a primary partition, though. > > That sounds massively expensive. if you move only the keys + version around (not the values) shouldn't be that bad from a load perspective. > I think the right solution at this point is entry versioning using vector > clocks and the vector clocks are exchanged and compared during a merge. Not > the entire dataset. +1 > >> One final point... when a node comes back online and it has a local cache >> store, it is very much as if we had a merge view. The current approach is to >> join as if the node didn't have any data, then delete everything from the >> cache store that is not mapped to the node in the consistent hash. Obviously >> that can lead to consistency problems, just like our current merge >> algorithm. It would be nice if we could handle both these cases the same way. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
