[
https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358657#comment-14358657
]
ASF GitHub Bot commented on FLINK-1350:
---------------------------------------
Github user uce commented on the pull request:
https://github.com/apache/flink/pull/471#issuecomment-78479167
The root "cause" of all asynchronous operations is that TCP connections are
shared among multiple logical channels, which are handled by a fixed number of
network I/O threads. In case of synchronous I/O operations, we would
essentially block progress on all channels sharing that connection/thread.
> When do you issue the read requests to the reader (from disk)? Is that
dependent on when the TCP channel is writable?
Yes, the network I/O thread has subpartitions queued for transfer and only
queries them for data when the TCP channel is writable.
> When the read request is issued, before the response comes, if the
subpartition de-registered from netty and the re-registered one a buffer has
returned from disk?
Exactly. If there is no buffer available, the read request is issued and
the next available subpartition is tried. If none of the subpartitions has data
available, the network I/O thread works on another TCP channel (this is done by
Netty, which multiplexes all TCP channels over a fixed amount of network I/O
threads).
> Given many spilled partitions, which one is read from next? How is the
buffer assignment realized? There is a lot of trickyness in there, because disk
I/O performs well with longer sequential reads, but that may occupy many
buffers that are missing for other reads into writable TCP channels.
Initially this depends on the order of partition requests. After that on
the order of data availability.
Regarding the buffers: trickyness, indeed. The current state with the
buffers is kind of an intermediate solution as we will issue zero-transfer
reads in the future (requires minimal changes), where we essentially only
trigger reads to gather offsets. The reads are then only affected by TCP
channel writability. Currently, the reads are batched in sizes of two buffers
(64k).
----
Regarding @tillrohrmann's changes: what was this exactly? Then I can verify
that the changes are not undone.
In general (minus the question regarding Till's changes) I think this PR is
good to merge. The tests are stable and passing. There will be definitely a
need to do refactorings and performance evaluations, but I think that is to be
expected with such a big change.
> Add blocking intermediate result partitions
> -------------------------------------------
>
> Key: FLINK-1350
> URL: https://issues.apache.org/jira/browse/FLINK-1350
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Runtime
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
>
> The current state of runtime support for intermediate results (see
> https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only
> supports pipelined intermediate results (with back pressure), which are
> consumed as they are being produced.
> The next variant we need to support are blocking intermediate results
> (without back pressure), which are fully produced before being consumed. This
> is for example desirable in situations, where we currently may run into
> deadlocks when running pipelined.
> I will start working on this on top of my pending pull request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)