[
https://issues.apache.org/jira/browse/FLINK-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144654#comment-15144654
]
Stephan Ewen edited comment on FLINK-3291 at 2/12/16 2:51 PM:
--------------------------------------------------------------
Thanks, [~greghogan] and [~ggevay] for looking into this. It's good some people
start going in-depth there.
Here are a few comments from what I thought initially when we implemented the
first object reuse versions:
- The initial idea of a contract for {{MutableObjectiterator.next(reuse)}}
was the following:
1. The caller may not hold onto {{reuse}} any more
2. The iterator implementor may not hold onto the returned object any
more.
Given that this was long ago (5 years probably, since I created that
interface), I am pretty sure that contract is not obeyed everywhere.
- In all cases, the non-reuse implementations should work without
{{serializer.createInstance()}}, because in some corner cases, neither Flink
nor Kryo/Objenesis manages to instantiate the object.
- Copying objects is sometimes necessary, but should be avoided where
possible. In microbenchmarks, the copies are often cheap (tuples of Strings,
Integers), in practice many objects are JSON, Avro, Thrift, and are hellishly
expensive to copy.
Especially in the non-reusing mode, I should be completely avoidable. The
object reuse-mode is mainly for programs with types that are mutable and
efficient to copy. Reuse can become more expensive the non-reuse if copies are
expensive.
was (Author: stephanewen):
Thanks, @Greg and [~ggevay] for looking into this. It's good some people start
going in-depth there.
Here are a few comments from what I thought initially when we implemented the
first object reuse versions:
- The initial idea of a contract for {{MutableObjectiterator.next(reuse)}}
was the following:
1. The caller may not hold onto {{reuse}} any more
2. The iterator implementor may not hold onto the returned object any
more.
Given that this was long ago (5 years probably, since I created that
interface), I am pretty sure that contract is not obeyed everywhere.
- In all cases, the non-reuse implementations should work without
{{serializer.createInstance()}}, because in some corner cases, neither Flink
nor Kryo/Objenesis manages to instantiate the object.
- Copying objects is sometimes necessary, but should be avoided where
possible. In microbenchmarks, the copies are often cheap (tuples of Strings,
Integers), in practice many objects are JSON, Avro, Thrift, and are hellishly
expensive to copy.
Especially in the non-reusing mode, I should be completely avoidable. The
object reuse-mode is mainly for programs with types that are mutable and
efficient to copy. Reuse can become more expensive the non-reuse if copies are
expensive.
> Object reuse bug in MergeIterator.HeadStream.nextHead
> -----------------------------------------------------
>
> Key: FLINK-3291
> URL: https://issues.apache.org/jira/browse/FLINK-3291
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime
> Affects Versions: 1.0.0
> Reporter: Gabor Gevay
> Assignee: Gabor Gevay
> Priority: Critical
>
> MergeIterator.HeadStream.nextHead saves a reference into `this.head` of the
> `reuse` object that it got as an argument. This object might be modified
> later by the caller.
> This actually happens when ReduceDriver.run calls input.next (which will
> actually be MergeIterator.next(E reuse)) in the inner while loop of the
> objectReuseEnabled branch, and that calls top.nextHead with the reference
> that it got from ReduceDriver, which erroneously saves the reference, and
> then ReduceDriver later uses that same object for doing the reduce.
> Another way in which this fails is when MergeIterator.next(E reuse) gives
> `reuse` to different `top`s in different calls, and then the heads end up
> being the same object.
> You can observe the latter situation in action by running ReducePerformance
> here:
> https://github.com/ggevay/flink/tree/merge-iterator-object-reuse-bug
> Set memory to -Xmx200m (so that the MergeIterator actually has merging to
> do), put a breakpoint at the beginning of MergeIterator.next(reuse), and then
> watch `reuse`, and the heads of the first two elements of `this.heap` in the
> debugger. They will get to be the same object after hitting continue about 6
> times.
> You can also look at the count that is printed at the end, which shouldn't be
> larger than the key range. Also, if you look into the output file
> /tmp/xxxobjectreusebug, for example the key 999977 appears twice.
> The good news is that I think I can see an easy fix that doesn't affect
> performance: MergeIterator.HeadStream could have a reuse object of its own as
> a member, and give that to iterator.next in nextHead(E reuse). And then we
> wouldn't need the overload of nextHead that has the reuse parameter, and
> MergeIterator.next(E reuse) could just call its other overload.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)