GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1051
[FLINK-2460] [runtime] Check parent state in isReleased() check of
partition view
Adds a check for the state of the parent partition a partition **view**
belongs to (the view consumes a sub partition).
During cancelling there was a possible interleaving when a released
partition was not noticed by the consumer. The issue for this PR reported the
following stack trace:
```bash
"SortMerger Reading Thread" daemon prio=10 tid=0x00007f7740107800
nid=0x13cbc runnable [0x00007f7722bb1000]
java.lang.Thread.State: RUNNABLE
at
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextLookAhead(LocalInputChannel.java:256)
at
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:120)
- locked <0x00000000ef9c1028> (a java.lang.Object)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:377)
- locked <0x00000000ef9c0da8> (a java.lang.Object)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:400)
at
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:79)
at
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
at
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:958)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:781)
"CoGroup (CoGroup at groupReduceOnNeighbors(Graph.java:1405)) (4/4)" daemon
prio=10 tid=0x00007f772c45a800 nid=0x13c9e waiting for monitor entry
[0x00007f7721ba1000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:322)
- waiting to lock <0x00000000ef9c0da8> (a java.lang.Object)
at
org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:379)
- locked <0x00000000d5519898> (a java.lang.Object)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:674)
at java.lang.Thread.run(Thread.java:701)
```
This PR adds a test that verifies that the parent release state is checked
by the respective views as well.
**Note**: This needs to be merged to `release-0.9` as well.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink cogroup_closer-2460
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1051.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1051
----
commit 7c99da9c716238349e5bfd17a1d48c6a338e5f76
Author: Ufuk Celebi <[email protected]>
Date: 2015-08-10T13:15:07Z
[FLINK-2460] [runtime] Check parent state in isReleased() check of
partition view
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---