[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709214#comment-14709214
 ] 

ASF GitHub Bot commented on FLINK-2089:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/1050

    [FLINK-2089] [runtime] Fix illegal state in RecordWriter after partition 
write failure

    I'm waiting for feedback from a user whether this fixes FLINK-2089, but 
this PR definitely addresses a problem.
    
    Record writers have a `clearBuffers` method, which is called by the task 
code in a finally block at the end of `invoke` (see `RegularPactTask` for 
example). This call clears the buffers of the record serializers.
    
    The following illegal state can arise: a buffer has been published to a 
partition, but the serializers still hold a reference to it. When a serializer 
tries to clear its current buffer, it might have already been recycled (because 
it was published to the partition). This will currently happen if there was an 
Exception during writing the buffer to the partition.
    
    This PR replaces the write-and-clear calls with a try-catch-finally block 
and tests the expected behaviour in a new test. The removed tests are 
superseded by this new test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink illegal-2089-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1050
    
----
commit 88ca58b5e3e78354ca1cffee4e11b48011333c6b
Author: Ufuk Celebi <[email protected]>
Date:   2015-08-19T14:11:13Z

    [FLINK-2089] [runtime] Fix illegal state in RecordWriter after partition 
write failure

----


> "Buffer recycled" IllegalStateException during cancelling
> ---------------------------------------------------------
>
>                 Key: FLINK-2089
>                 URL: https://issues.apache.org/jira/browse/FLINK-2089
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>    Affects Versions: master
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>             Fix For: 0.9
>
>
> [~rmetzger] reported the following stack trace during cancelling of high 
> parallelism jobs:
> {code}
> Error: java.lang.IllegalStateException: Buffer has already been recycled.
> at 
> org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
> at 
> org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
> at 
> org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
> at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
> at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
> at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
> at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
> at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
> at 
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> This looks like a concurrent buffer pool release/buffer usage error. I'm 
> investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to