[
https://issues.apache.org/jira/browse/FLINK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xintong Song closed FLINK-28925.
--------------------------------
Resolution: Fixed
master (1.16): 7ed817f2054a13c3e2754c37f7681d8fbdba4b41
> Fix the concurrency problem in hybrid shuffle
> ---------------------------------------------
>
> Key: FLINK-28925
> URL: https://issues.apache.org/jira/browse/FLINK-28925
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.16.0
> Reporter: Weijie Guo
> Assignee: Weijie Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Through tpc-ds testing and code analysis, I found some thread unsafe problems
> in hybrid shuffle:
> # HsSubpartitionMemeoryDataManager#consumeBuffer should return a
> readOnlySlice buffer to downstream instead of original buffer: If the
> spilling thread is processing while downstream task is consuming the same
> buffer, the amount of data written to the disk will be smaller than the
> actual value. To solve this, we should let the consuming thread and the
> spilling thread share the same data but not index.
> # HsSubpartitionMemoryDataManager#releaseSubpartitionBuffers should ignore
> the release decision if the buffer already removed from bufferIndexToContexts
> instead of throw an exception. It should be pointed out that although the
> actual release operation is synchronous, a double release can still happen.
> The reason is that non-global decisions do not need to be synchronized. In
> other words, the main task thread and the consumer thread may decide to
> release a buffer at the same time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)