On 10/09/2014 04:41 PM, Dan Berindei wrote:
>
>
> On Thu, Oct 9, 2014 at 3:40 PM, William Burns <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Actually this was something I was hoping to get to possibly in the
>     near future.
>
>     I already have to do https://issues.jboss.org/browse/ISPN-4358 which
>     will require rewriting parts of the distributed entry iterator.  In
>     doing so I was planning on breaking this out to a more generic
>     framework where you could run a given operation by segment
>     guaranteeing it was only ran once per entry.  In doing so I was
>     thinking I could try to move M/R on top of this to allow it to also be
>     resilient to rehash events.
>
>     Additional comments inline.
>
>     On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard
>     <[email protected] <mailto:[email protected]>> wrote:
>     > Pedro and I have been having discussions with the LEADS guys on their 
> experience of Map / Reduce especially around stability during topology 
> changes.
>     >
>     > This ties to the .size() thread you guys have been exchanging on (I 
> only could read it partially).
>     >
>     > On the requirements, theirs is pretty straightforward and expected I 
> think from most users.
>     > They are fine with inconsistencies with entries create/updated/deleted 
> between the M/R start and the end.
>
>     There is no way we can fix this without adding a very strict isolation
>     level like SERIALIZABLE.
>
>
>     > They are *not* fine with seeing the same key/value several time for the 
> duration of the M/R execution. This AFAIK can happen when a topology change 
> occurs.
>
>     This can happen if it was processed on one node and then rehash
>     migrates the entry to another and runs it there.
>
>     >
>     > Here is a proposal.
>     > Why not run the M/R job not per node but rather per segment?
>     > The point is that segments are stable across topology changes. The M/R 
> tasks would then be about iterating over the keys in a given segment.
>     >
>     > The M/R request would send the task per segments on each node where the 
> segment is primary.
>
>     This is exactly what the iterator does today but also watches for
>     rehashes to send the request to a new owner when the segment moves
>     between nodes.
>
>     > (We can imagine interesting things like sending it to one of the 
> backups for workload optimization purposes or sending it to both primary and 
> backups and to comparisons).
>     > The M/R requester would be in an interesting situation. It could detect 
> that a segment M/R never returns and trigger a new computation on another 
> node than the one initially sent.
>     >
>     > One tricky question around that is when the M/R job store data in an 
> intermediary state. We need some sort of way to expose the user indirectly to 
> segments so that we can evict per segment intermediary caches in case of 
> failure or retry.
>
>     This was one place I was thinking I would need to take special care to
>     look into when doing a conversion like this.
>
>
> I'd rather not expose this to the user. Instead, we could split the
> intermediary values for each key by the source segment, and do the
> invalidation of the retried segments in our M/R framework (e.g. when we
> detect that the primary owner at the start of the map/combine phase is
> not an owner at all at the end).
>
> I think we have another problem with the publishing of intermediary
> values not being idempotent. The default configuration for the
> intermediate cache is non-transactional, and retrying the put(delta)
> command after a topology change could add the same intermediate values
> twice. A transactional intermediary cache should be safe, though,
> because the tx won't commit on the old owner until the new owner knows
> about the tx.

can you elaborate on it?

anyway, I think the retry mechanism should solve it. If we detect a 
topology change (during the iteration of segment _i_) and the segment 
_i_ is moved, then we can cancel the iteration, remove all the 
intermediate values generated in segment _i_ and restart (on the primary 
owner).
>
>
>     >
>     > But before getting ahead of ourselves, what do you thing of the general 
> idea? Even without retry framework, this approach would be more stable than 
> our current per node approach during topology changes and improve 
> dependability.
>
>     Doing it solely based on segment would remove the possibility of
>     having duplicates.  However without a mechanism to send a new request
>     on rehash it would be possible to only find a subset of values (if a
>     segment is removed while iterating on it).
>
>      >
>      > Emmanuel
>      > _______________________________________________
>      > infinispan-dev mailing list
>      > [email protected]
>     <mailto:[email protected]>
>      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     _______________________________________________
>     infinispan-dev mailing list
>     [email protected] <mailto:[email protected]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to