On 17 Apr 2013, at 16:53, Manik Surtani wrote:

>> Of course, you could argue that the cluster already lost those keys when we 
>> allowed the majority partition to continue working without having those 
>> keys... We could also rely on the topology information, and say that we only 
>> support partitioning when numOwners >= numSites (or numRacks, if there is 
>> only one site, or numMachines, if there is a single rack).
> 
> This is only true for an embedded app.  For an app communicating with the 
> cluster over Hot Rod, this isn't the case as it could directly read from the 
> minority partition.

I don't think it's that simple:
- k1 is in the minority partition *only*
- it is removed/updated in the majority partition
- then read in the minority partition
 
> 
>> One other option is to perform a more complicated post-merge state transfer, 
>> in which each partition sends all the data it has to all the other 
>> partitions, and on the receiving end each node has a "conflict resolution" 
>> component that can merge two values. That is definitely more complicated 
>> than just going with a primary partition, though.
> 
> That sounds massively expensive.
if you move only the keys + version around (not the values) shouldn't be that 
bad from a load perspective. 
>  I think the right solution at this point is entry versioning using vector 
> clocks and the vector clocks are exchanged and compared during a merge.  Not 
> the entire dataset.
+1
> 
>> One final point... when a node comes back online and it has a local cache 
>> store, it is very much as if we had a merge view. The current approach is to 
>> join as if the node didn't have any data, then delete everything from the 
>> cache store that is not mapped to the node in the consistent hash. Obviously 
>> that can lead to consistency problems, just like our current merge 
>> algorithm. It would be nice if we could handle both these cases the same way.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)





_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to