[
https://issues.apache.org/jira/browse/FLINK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377099#comment-16377099
]
ASF GitHub Bot commented on FLINK-8755:
---------------------------------------
GitHub user NicoK opened a pull request:
https://github.com/apache/flink/pull/5581
[FLINK-8755][FLINK-8786][network] fix two bugs in spilled and spillable
subpartition views
## What is the purpose of the change
1) `SpilledSubpartitionView#getNextBuffer()` relies on the backlog to
signal further data availability. However, if there are only events left in the
buffer queue, their buffers are not included in the backlog counting and
therefore, `isMoreAvailable` will be wrongly false here.
2) When processing the last in-memory buffer in
`SpillableSubpartitionView#getNextBuffer`, we always set the `isMoreAvailable`
flag of the returned `BufferAndBacklog` to false irrespective of what may be in
the spill writer.
This PR fixes both issues and heavily extends the unit tests in this
regard, hence the two were combined in a single PR. Please also note that this
PR is built upon #5549, #5550, #5551, and #5572 to reduce possible merge
conflicts - everything starting after FLINK-8694 is new.
## Brief change log
- rename `RecordWriter#closeBufferConsumer()` to `closeBufferBuilder()`
(internal method, we switched to buffer builders a while ago)
- make `AwaitableBufferAvailablityListener` (used by tests only) thread-safe
- fix `SpilledSubpartitionView#getNextBuffer()` to not only rely on the
backlog
- fix `SpillableSubpartitionView#getNextBuffer()` returning wrong
`isMoreAvailable` when processing the last in-memory buffer
- extended overall subpartition tests to also verify several other flags
that were added in the past but not covered appropriately, e.g.
`BufferAndBacklog#isMoreAvailable()` or `ResultSubpartitionView#isAvailable()`
- some minor code and documentation improvements
(more details in the individual commits)
## Verifying this change
This change added tests and can be verified as follows:
- added several checks to `PipelinedSubpartitionTest` and
`SpillableSubpartitionTest` via helper methods in `SubpartitionTestBase`
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
- The S3 file system connector: **no**
## Documentation
- Does this pull request introduce a new feature? (yes / no)
- If yes, how is the feature documented? (not applicable / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NicoK/flink flink-8786
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5581.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5581
----
commit 40e18c85563e1ef45ce89709fa3aa7613439e12d
Author: Nico Kruber <nico@...>
Date: 2018-02-20T17:04:12Z
[FLINK-8733][network] fix
SpillableSubpartition#spillFinishedBufferConsumers() not counting spilled bytes
commit 2721cabce0dd2be4bb4da4097ff4e6c7749498c1
Author: Nico Kruber <nico@...>
Date: 2018-02-20T17:05:54Z
[FLINK-8734][network] fix partition bytes counting and re-enable in tests
commit 375de6118a5d84d21b40b7c23438d09204ad664b
Author: Nico Kruber <nico@...>
Date: 2018-02-20T17:06:41Z
[hotfix][network] remove PowerMockRunner from RecordWriterTest
commit d0d3c7b026d5af3a57c47892501ab0e74e7172b2
Author: Nico Kruber <nico@...>
Date: 2018-02-20T17:07:02Z
[hotfix][network] various minor improvements
commit dbcfe73c41618c70e16884f2c723fc9a6a9dca4f
Author: Nico Kruber <nico@...>
Date: 2018-02-21T15:30:53Z
[hotfix][network] initialize SingleInputGate#enqueuedInputChannelsWithData
with the right size
commit 13f2e09240b9efb8163bb93dad52486fc2af65ac
Author: Nico Kruber <nico@...>
Date: 2018-02-21T16:09:31Z
[FLINK-8736][network] fix memory segment offsets for slices of slices being
wrong
commit f8363154ef2e03b99a471f627028ad50fc1271ab
Author: Nico Kruber <nico@...>
Date: 2018-02-23T17:07:31Z
fixup! [hotfix][network] various minor improvements
commit 8cf861f06ddc9c79fc61407ebe426213d1740ef7
Author: Piotr Nowojski <piotr.nowojski@...>
Date: 2018-02-23T10:20:21Z
[FLINK-8760][runtime] Correctly propagate moreAvailable flag through
SingleInputGate
Previously if we SingleInputGate was re-eqnqueuing an input channel,
isMoreAvailable
might incorrectly return false. This might caused some dead locks.
commit c7cda5463e7bba1d2f3f62006f6e4a71246efccb
Author: Piotr Nowojski <piotr.nowojski@...>
Date: 2018-02-23T10:37:37Z
[hotfix][tests] Deduplicate code in SingleInputGateTest
commit 954c019a70cf881de7410762977e033e2768c11e
Author: Piotr Nowojski <piotr.nowojski@...>
Date: 2018-02-23T11:11:14Z
[hotfix][runtime] Remove duplicated check
commit 55832a9ac1355e48fc9665967074ebc262da2f65
Author: Piotr Nowojski <piotr.nowojski@...>
Date: 2018-02-23T10:27:54Z
[hotfixu][tests] Do not hide original exception in
SuccessAfterNetworkBuffersFailureITCase
commit b6d98e99dd5cbf7cc0554cd83b81f3d2621e0057
Author: Piotr Nowojski <piotr.nowojski@...>
Date: 2018-02-23T10:28:20Z
[FLINK-8694][runtime] Fix notifyDataAvailable race condition
Before there was a race condition that might resulted in igonoring some
notifyDataAvailable calls.
This fixes the problem by moving buffersAvailable handling to Supartitions
and adds stress test
for flushAlways (without this fix this test is dead locking).
commit d65ffe94414e8bc4a6457955c8ca89bf68f537cb
Author: Nico Kruber <nico@...>
Date: 2018-02-26T14:07:49Z
fixup! [FLINK-8694][runtime] Fix notifyDataAvailable race condition
commit 26508abe76d7436c13b8415cd0411560cb31f4d4
Author: Nico Kruber <nico@...>
Date: 2018-02-26T15:39:01Z
fixup! [FLINK-8694][runtime] Fix notifyDataAvailable race condition
commit 00e9b252d218932dda3daf28f95cc8fd2e34cac8
Author: Nico Kruber <nico@...>
Date: 2018-02-22T13:11:13Z
[hotfix][network] rename RecordWriter#closeBufferConsumer() to
closeBufferBuilder()
commit 5ecd6f7215e164e6971cd305f16974639fccc7a3
Author: Nico Kruber <nico@...>
Date: 2018-02-22T13:17:06Z
[hotfix][network] various minor improvements
commit e3cff13a7785fe61236a735bb37a0fd4345fe13b
Author: Nico Kruber <nico@...>
Date: 2018-02-23T09:35:41Z
[hotfix][network][tests] make AwaitableBufferAvailablityListener thread-safe
This is called asynchronously by the spill writer and thus may need
synchronization on incrementing the counter but definately had visibility
issues with the counter. Using an AtomicLong fixes that.
commit 8a8470a6225bbdf8e0c962ba0d9bbbd75354d309
Author: Nico Kruber <nico@...>
Date: 2018-02-23T09:19:58Z
[FLINK-8755][network] fix SpilledSubpartitionView relying on the backlog
for determining whether more data is available
Fix SpilledSubpartitionView#getNextBuffer() to not only rely on the backlog:
instead it is sufficient to also return true if the next buffer is an event
since either there is a real buffer enqueued (reflected by the backlog) or
at
least one event.
commit 96dd3ffe5cbb6581c2bc80c86853c7249973c15e
Author: Nico Kruber <nico@...>
Date: 2018-02-23T11:13:20Z
[FLINK-8755][FLINK-8786][network] add and improve subpartition tests
+ also improve the subpartition tests in general to reduce some duplication
commit 603b1562a6ad257376057a1a3fc507604d83ffcd
Author: Nico Kruber <nico@...>
Date: 2018-02-26T15:27:44Z
[FLINK-8786][network] fix SpillableSubpartitionView#getNextBuffer returning
wrong isMoreAvailable when processing last in-memory buffer
When processing the last in-memory buffer in
SpillableSubpartitionView#getNextBuffer while the rest of the buffers are
spilled, need to rely on the spilled view's isAvailable instead of always
setting the isMoreAvailable flag of the returned BufferAndBacklog to false.
----
> SpilledSubpartitionView wrongly relys on the backlog for determining whether
> more data is available
> ---------------------------------------------------------------------------------------------------
>
> Key: FLINK-8755
> URL: https://issues.apache.org/jira/browse/FLINK-8755
> Project: Flink
> Issue Type: Sub-task
> Components: Network
> Reporter: Nico Kruber
> Assignee: Nico Kruber
> Priority: Blocker
> Fix For: 1.5.0
>
>
> {code}
> public BufferAndBacklog getNextBuffer() throws IOException,
> InterruptedException {
> //...
> int newBacklog = parent.decreaseBuffersInBacklog(current);
> return new BufferAndBacklog(current, newBacklog > 0, newBacklog,
> nextBufferIsEvent);
> {code}
> relies on the backlog to signal further data availability. However, if there
> are only events left in the buffer queue, their buffers are not included in
> the backlog counting and therefore, {{isMoreAvailable}} will be wrongly
> {{false}} here.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)