Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/471#issuecomment-78479167
  
    The root "cause" of all asynchronous operations is that TCP connections are 
shared among multiple logical channels, which are handled by a fixed number of 
network I/O threads. In case of synchronous I/O operations, we would 
essentially block progress on all channels sharing that connection/thread.
    
    > When do you issue the read requests to the reader (from disk)? Is that 
dependent on when the TCP channel is writable?
    
    Yes, the network I/O thread has subpartitions queued for transfer and only 
queries them for data when the TCP channel is writable.
    
    > When the read request is issued, before the response comes, if the 
subpartition de-registered from netty and the re-registered one a buffer has 
returned from disk?
    
    Exactly. If there is no buffer available, the read request is issued and 
the next available subpartition is tried. If none of the subpartitions has data 
available, the network I/O thread works on another TCP channel (this is done by 
Netty, which multiplexes all TCP channels over a fixed amount of network I/O 
threads).
    
    > Given many spilled partitions, which one is read from next? How is the 
buffer assignment realized? There is a lot of trickyness in there, because disk 
I/O performs well with longer sequential reads, but that may occupy many 
buffers that are missing for other reads into writable TCP channels.
    
    Initially this depends on the order of partition requests. After that on 
the order of data availability. 
    Regarding the buffers: trickyness, indeed. The current state with the 
buffers is kind of an intermediate solution as we will issue zero-transfer 
reads in the future (requires minimal changes), where we essentially only 
trigger reads to gather offsets. The reads are then only affected by TCP 
channel writability. Currently, the reads are batched in sizes of two buffers 
(64k).
    
    ----
    
    Regarding @tillrohrmann's changes: what was this exactly? Then I can verify 
that the changes are not undone.
    
    In general (minus the question regarding Till's changes) I think this PR is 
good to merge. The tests are stable and passing. There will be definitely a 
need to do refactorings and performance evaluations, but I think that is to be 
expected with such a big change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to