Actually this was something I was hoping to get to possibly in the near future.

I already have to do https://issues.jboss.org/browse/ISPN-4358 which
will require rewriting parts of the distributed entry iterator.  In
doing so I was planning on breaking this out to a more generic
framework where you could run a given operation by segment
guaranteeing it was only ran once per entry.  In doing so I was
thinking I could try to move M/R on top of this to allow it to also be
resilient to rehash events.

Additional comments inline.

On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard <[email protected]> wrote:
> Pedro and I have been having discussions with the LEADS guys on their 
> experience of Map / Reduce especially around stability during topology 
> changes.
>
> This ties to the .size() thread you guys have been exchanging on (I only 
> could read it partially).
>
> On the requirements, theirs is pretty straightforward and expected I think 
> from most users.
> They are fine with inconsistencies with entries create/updated/deleted 
> between the M/R start and the end.

There is no way we can fix this without adding a very strict isolation
level like SERIALIZABLE.

> They are *not* fine with seeing the same key/value several time for the 
> duration of the M/R execution. This AFAIK can happen when a topology change 
> occurs.

This can happen if it was processed on one node and then rehash
migrates the entry to another and runs it there.

>
> Here is a proposal.
> Why not run the M/R job not per node but rather per segment?
> The point is that segments are stable across topology changes. The M/R tasks 
> would then be about iterating over the keys in a given segment.
>
> The M/R request would send the task per segments on each node where the 
> segment is primary.

This is exactly what the iterator does today but also watches for
rehashes to send the request to a new owner when the segment moves
between nodes.

> (We can imagine interesting things like sending it to one of the backups for 
> workload optimization purposes or sending it to both primary and backups and 
> to comparisons).
> The M/R requester would be in an interesting situation. It could detect that 
> a segment M/R never returns and trigger a new computation on another node 
> than the one initially sent.
>
> One tricky question around that is when the M/R job store data in an 
> intermediary state. We need some sort of way to expose the user indirectly to 
> segments so that we can evict per segment intermediary caches in case of 
> failure or retry.

This was one place I was thinking I would need to take special care to
look into when doing a conversion like this.

>
> But before getting ahead of ourselves, what do you thing of the general idea? 
> Even without retry framework, this approach would be more stable than our 
> current per node approach during topology changes and improve dependability.

Doing it solely based on segment would remove the possibility of
having duplicates.  However without a mechanism to send a new request
on rehash it would be possible to only find a subset of values (if a
segment is removed while iterating on it).

>
> Emmanuel
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to