GitHub user uce opened a pull request: https://github.com/apache/flink/pull/736
[FLINK-2089] [runtime] Fix possible duplicate buffer release This PR contains multiple independent commits, which address issues discovered while debugging FLINK-2089. - It adds the partition request backoff logic to local requests as well. The backoffs were introduced recently for remote requests. I've missed that the same problem could also happen for local input channels. The fix was easy and moves the backoff logic to the abstract InputChannel, which both Local and RemoteInputChannel extend. - The duplicate buffer release was hard to track. In some corner cases, the record serializers were incorrectly holding references to buffers *after* written them out to a result partition. In failure cases, the serializers recycled these buffers too early. The later recycling (by the component, which is actually responsible for this) then resulted in an IllegalStateException. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/incubator-flink cancel-2089 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/736.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #736 ---- commit e134f44640f2caafc6cff76fd100b11e3aa47515 Author: Ufuk Celebi <u...@apache.org> Date: 2015-05-26T09:15:17Z [runtime] [tests] Add TaskCancelTest commit d6a33bfda84ea861e05e7a0aff6c529808c02bb2 Author: Ufuk Celebi <u...@apache.org> Date: 2015-05-26T13:37:35Z [FLINK-1636] [runtime] Add partition request backoff logic to LocalInputChannel commit 7ea3ed2ad4c95c1bec0f2d558ba0d4faf9716f14 Author: Ufuk Celebi <u...@apache.org> Date: 2015-05-27T12:49:01Z [FLINK-2089] Fix possible duplicate buffer release Problem: RecordWriter instances have stateful serializers, which include the buffer that they currently work with. If the serializer state is not cleared correctly by the writers after writing a buffer to the respective result partition, it is possible that buffers are recycled multiple times in failure cases. This results in an IllegalStateException. Solution: After writing a buffer to a ResultPartition, the RecordWriter makes sure that the serializer clears the reference to the respective buffer. The recycling of the buffer is then the responsibility of the result partition. commit 91b6049ac371a62671c36a8280b0a60f1b2b7408 Author: Ufuk Celebi <u...@apache.org> Date: 2015-05-27T16:26:46Z [runtime] [logging] Fix log message ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---