Pedro and I have been having discussions with the LEADS guys on their 
experience of Map / Reduce especially around stability during topology changes.

This ties to the .size() thread you guys have been exchanging on (I only could 
read it partially).

On the requirements, theirs is pretty straightforward and expected I think from 
most users.
They are fine with inconsistencies with entries create/updated/deleted between 
the M/R start and the end.
They are *not* fine with seeing the same key/value several time for the 
duration of the M/R execution. This AFAIK can happen when a topology change 
occurs.

Here is a proposal.
Why not run the M/R job not per node but rather per segment?
The point is that segments are stable across topology changes. The M/R tasks 
would then be about iterating over the keys in a given segment.

The M/R request would send the task per segments on each node where the segment 
is primary.
(We can imagine interesting things like sending it to one of the backups for 
workload optimization purposes or sending it to both primary and backups and to 
comparisons).
The M/R requester would be in an interesting situation. It could detect that a 
segment M/R never returns and trigger a new computation on another node than 
the one initially sent.

One tricky question around that is when the M/R job store data in an 
intermediary state. We need some sort of way to expose the user indirectly to 
segments so that we can evict per segment intermediary caches in case of 
failure or retry.

But before getting ahead of ourselves, what do you thing of the general idea? 
Even without retry framework, this approach would be more stable than our 
current per node approach during topology changes and improve dependability.

Emmanuel
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to