[ 
https://issues.apache.org/jira/browse/FLINK-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709474#comment-14709474
 ] 

ASF GitHub Bot commented on FLINK-2460:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/1051

    [FLINK-2460] [runtime] Check parent state in isReleased() check of 
partition view

    Adds a check for the state of the parent partition a partition **view** 
belongs to (the view consumes a sub partition).
    
    During cancelling there was a possible interleaving when a released 
partition was not noticed by the consumer. The issue for this PR reported the 
following stack trace:
    
    ```bash
    "SortMerger Reading Thread" daemon prio=10 tid=0x00007f7740107800 
nid=0x13cbc runnable [0x00007f7722bb1000]
       java.lang.Thread.State: RUNNABLE
        at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextLookAhead(LocalInputChannel.java:256)
        at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:120)
        - locked <0x00000000ef9c1028> (a java.lang.Object)
        at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:377)
        - locked <0x00000000ef9c0da8> (a java.lang.Object)
        at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:400)
        at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:79)
        at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
        at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
        at 
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:958)
        at 
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:781)
    
    "CoGroup (CoGroup at groupReduceOnNeighbors(Graph.java:1405)) (4/4)" daemon 
prio=10 tid=0x00007f772c45a800 nid=0x13c9e waiting for monitor entry 
[0x00007f7721ba1000]
       java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:322)
        - waiting to lock <0x00000000ef9c0da8> (a java.lang.Object)
        at 
org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:379)
        - locked <0x00000000d5519898> (a java.lang.Object)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:674)
        at java.lang.Thread.run(Thread.java:701)
    ```
    
    This PR adds a test that verifies that the parent release state is checked 
by the respective views as well.
    
    **Note**: This needs to be merged to `release-0.9` as well.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink cogroup_closer-2460

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1051
    
----
commit 7c99da9c716238349e5bfd17a1d48c6a338e5f76
Author: Ufuk Celebi <[email protected]>
Date:   2015-08-10T13:15:07Z

    [FLINK-2460] [runtime] Check parent state in isReleased() check of 
partition view

----


> ReduceOnNeighborsWithExceptionITCase failure
> --------------------------------------------
>
>                 Key: FLINK-2460
>                 URL: https://issues.apache.org/jira/browse/FLINK-2460
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Sachin Goel
>            Assignee: Ufuk Celebi
>
> I noticed a build error due to failure on this case. It was on a branch of my 
> fork, which didn't actually have anything to do with the failed test or the 
> runtime system at all.
> Here's the error log: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/73695554/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to