[
https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358615#comment-14358615
]
ASF GitHub Bot commented on FLINK-1350:
---------------------------------------
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/471#issuecomment-78474538
Concerning the questions:
1. I think deploying after all blocking producers are finished is what we
should go for as a start. It is also what people would expect from a blocking
model.
2. This is a fair initial restriction. Let's relax that later, I can see
benefits in that when dealing with tasks that cannot be deployed due to a a
lack of resources.
A few questions:
The pull request generifies the IOManager and uses asynchronous disk I/O
for the intermediate result spilling. Are there any experience points that this
helps performance in the case here? I am curious, because the async I/O in the
hash join / sorters was tricky enough. The interaction between asynchronous
disk I/O and asynchronous network I/O must be very tricky. I think there should
be a good reason to do this, otherwise we simply introduce error prone code for
a completely unknown benefit.
The asynchronous writing seems straightforward. For the reading / sending
part:
- When do you issue the read requests to the reader (from disk)? Is that
dependent on when the TCP channel is writable?
- When the read request is issued, before the response comes, if the
subpartition de-registered from netty and the re-registered one a buffer has
returned from disk?
- Given many spilled partitions, which one is read from next? How is the
buffer assignment realized? There is a lot of trickyness in there, because disk
I/O performs well with longer sequential reads, but that may occupy many
buffers that are missing for other reads into writable TCP channels.
Can you elaborate on the mechanism behind this? I expect this to have quite
an impact on the reliability of the mechanism and the performance.
*IMPORTANT*: There has been a fix by @tillrohrmann to the Asynchronous
Channel Readers / Writers a few weeks back . Are we sure that this is not
undone by the changes here?
> Add blocking intermediate result partitions
> -------------------------------------------
>
> Key: FLINK-1350
> URL: https://issues.apache.org/jira/browse/FLINK-1350
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Runtime
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
>
> The current state of runtime support for intermediate results (see
> https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only
> supports pipelined intermediate results (with back pressure), which are
> consumed as they are being produced.
> The next variant we need to support are blocking intermediate results
> (without back pressure), which are fully produced before being consumed. This
> is for example desirable in situations, where we currently may run into
> deadlocks when running pipelined.
> I will start working on this on top of my pending pull request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)