[ 
https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358657#comment-14358657
 ] 

ASF GitHub Bot commented on FLINK-1350:
---------------------------------------

Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/471#issuecomment-78479167
  
    The root "cause" of all asynchronous operations is that TCP connections are 
shared among multiple logical channels, which are handled by a fixed number of 
network I/O threads. In case of synchronous I/O operations, we would 
essentially block progress on all channels sharing that connection/thread.
    
    > When do you issue the read requests to the reader (from disk)? Is that 
dependent on when the TCP channel is writable?
    
    Yes, the network I/O thread has subpartitions queued for transfer and only 
queries them for data when the TCP channel is writable.
    
    > When the read request is issued, before the response comes, if the 
subpartition de-registered from netty and the re-registered one a buffer has 
returned from disk?
    
    Exactly. If there is no buffer available, the read request is issued and 
the next available subpartition is tried. If none of the subpartitions has data 
available, the network I/O thread works on another TCP channel (this is done by 
Netty, which multiplexes all TCP channels over a fixed amount of network I/O 
threads).
    
    > Given many spilled partitions, which one is read from next? How is the 
buffer assignment realized? There is a lot of trickyness in there, because disk 
I/O performs well with longer sequential reads, but that may occupy many 
buffers that are missing for other reads into writable TCP channels.
    
    Initially this depends on the order of partition requests. After that on 
the order of data availability. 
    Regarding the buffers: trickyness, indeed. The current state with the 
buffers is kind of an intermediate solution as we will issue zero-transfer 
reads in the future (requires minimal changes), where we essentially only 
trigger reads to gather offsets. The reads are then only affected by TCP 
channel writability. Currently, the reads are batched in sizes of two buffers 
(64k).
    
    ----
    
    Regarding @tillrohrmann's changes: what was this exactly? Then I can verify 
that the changes are not undone.
    
    In general (minus the question regarding Till's changes) I think this PR is 
good to merge. The tests are stable and passing. There will be definitely a 
need to do refactorings and performance evaluations, but I think that is to be 
expected with such a big change.


> Add blocking intermediate result partitions
> -------------------------------------------
>
>                 Key: FLINK-1350
>                 URL: https://issues.apache.org/jira/browse/FLINK-1350
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> The current state of runtime support for intermediate results (see 
> https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only 
> supports pipelined intermediate results (with back pressure), which are 
> consumed as they are being produced.
> The next variant we need to support are blocking intermediate results 
> (without back pressure), which are fully produced before being consumed. This 
> is for example desirable in situations, where we currently may run into 
> deadlocks when running pipelined.
> I will start working on this on top of my pending pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to