[
https://issues.apache.org/jira/browse/FLINK-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709474#comment-14709474
]
ASF GitHub Bot commented on FLINK-2460:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1051
[FLINK-2460] [runtime] Check parent state in isReleased() check of
partition view
Adds a check for the state of the parent partition a partition **view**
belongs to (the view consumes a sub partition).
During cancelling there was a possible interleaving when a released
partition was not noticed by the consumer. The issue for this PR reported the
following stack trace:
```bash
"SortMerger Reading Thread" daemon prio=10 tid=0x00007f7740107800
nid=0x13cbc runnable [0x00007f7722bb1000]
java.lang.Thread.State: RUNNABLE
at
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextLookAhead(LocalInputChannel.java:256)
at
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:120)
- locked <0x00000000ef9c1028> (a java.lang.Object)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:377)
- locked <0x00000000ef9c0da8> (a java.lang.Object)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:400)
at
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:79)
at
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
at
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:958)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:781)
"CoGroup (CoGroup at groupReduceOnNeighbors(Graph.java:1405)) (4/4)" daemon
prio=10 tid=0x00007f772c45a800 nid=0x13c9e waiting for monitor entry
[0x00007f7721ba1000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:322)
- waiting to lock <0x00000000ef9c0da8> (a java.lang.Object)
at
org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:379)
- locked <0x00000000d5519898> (a java.lang.Object)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:674)
at java.lang.Thread.run(Thread.java:701)
```
This PR adds a test that verifies that the parent release state is checked
by the respective views as well.
**Note**: This needs to be merged to `release-0.9` as well.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink cogroup_closer-2460
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1051.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1051
----
commit 7c99da9c716238349e5bfd17a1d48c6a338e5f76
Author: Ufuk Celebi <[email protected]>
Date: 2015-08-10T13:15:07Z
[FLINK-2460] [runtime] Check parent state in isReleased() check of
partition view
----
> ReduceOnNeighborsWithExceptionITCase failure
> --------------------------------------------
>
> Key: FLINK-2460
> URL: https://issues.apache.org/jira/browse/FLINK-2460
> Project: Flink
> Issue Type: Bug
> Reporter: Sachin Goel
> Assignee: Ufuk Celebi
>
> I noticed a build error due to failure on this case. It was on a branch of my
> fork, which didn't actually have anything to do with the failed test or the
> runtime system at all.
> Here's the error log:
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/73695554/log.txt
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)