Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/471#issuecomment-78474538
Concerning the questions:
1. I think deploying after all blocking producers are finished is what we
should go for as a start. It is also what people would expect from a blocking
model.
2. This is a fair initial restriction. Let's relax that later, I can see
benefits in that when dealing with tasks that cannot be deployed due to a a
lack of resources.
A few questions:
The pull request generifies the IOManager and uses asynchronous disk I/O
for the intermediate result spilling. Are there any experience points that this
helps performance in the case here? I am curious, because the async I/O in the
hash join / sorters was tricky enough. The interaction between asynchronous
disk I/O and asynchronous network I/O must be very tricky. I think there should
be a good reason to do this, otherwise we simply introduce error prone code for
a completely unknown benefit.
The asynchronous writing seems straightforward. For the reading / sending
part:
- When do you issue the read requests to the reader (from disk)? Is that
dependent on when the TCP channel is writable?
- When the read request is issued, before the response comes, if the
subpartition de-registered from netty and the re-registered one a buffer has
returned from disk?
- Given many spilled partitions, which one is read from next? How is the
buffer assignment realized? There is a lot of trickyness in there, because disk
I/O performs well with longer sequential reads, but that may occupy many
buffers that are missing for other reads into writable TCP channels.
Can you elaborate on the mechanism behind this? I expect this to have quite
an impact on the reliability of the mechanism and the performance.
*IMPORTANT*: There has been a fix by @tillrohrmann to the Asynchronous
Channel Readers / Writers a few weeks back . Are we sure that this is not
undone by the changes here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---