[ 
https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358615#comment-14358615
 ] 

ASF GitHub Bot commented on FLINK-1350:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/471#issuecomment-78474538
  
    Concerning the questions:
    
    1.  I think deploying after all blocking producers are finished is what we 
should go for as a start. It is also what people would expect from a blocking 
model.
    
    2.  This is a fair initial restriction. Let's relax that later, I can see 
benefits in that when dealing with tasks that cannot be deployed due to a a 
lack of resources.
    
    A few questions:
    
    The pull request generifies the IOManager and uses asynchronous disk I/O 
for the intermediate result spilling. Are there any experience points that this 
helps performance in the case here? I am curious, because the async I/O in the 
hash join / sorters was tricky enough. The interaction between asynchronous 
disk I/O and asynchronous network I/O must be very tricky. I think there should 
be a good reason to do this, otherwise we simply introduce error prone code for 
a completely unknown benefit.
    
    The asynchronous writing seems straightforward. For the reading / sending 
part:
      - When do you issue the read requests to the reader (from disk)? Is that 
dependent on when the TCP channel is writable?
      - When the read request is issued, before the response comes, if the 
subpartition de-registered from netty and the re-registered one a buffer has 
returned from disk?
      - Given many spilled partitions, which one is read from next? How is the 
buffer assignment realized? There is a lot of trickyness in there, because disk 
I/O performs well with longer sequential reads, but that may occupy many 
buffers that are missing for other reads into writable TCP channels.
    
    
    Can you elaborate on the mechanism behind this? I expect this to have quite 
an impact on the reliability of the mechanism and the performance.
    
    *IMPORTANT*: There has been a fix by @tillrohrmann to the Asynchronous 
Channel Readers / Writers a few weeks back . Are we sure that this is not 
undone by the changes here?



> Add blocking intermediate result partitions
> -------------------------------------------
>
>                 Key: FLINK-1350
>                 URL: https://issues.apache.org/jira/browse/FLINK-1350
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> The current state of runtime support for intermediate results (see 
> https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only 
> supports pipelined intermediate results (with back pressure), which are 
> consumed as they are being produced.
> The next variant we need to support are blocking intermediate results 
> (without back pressure), which are fully produced before being consumed. This 
> is for example desirable in situations, where we currently may run into 
> deadlocks when running pipelined.
> I will start working on this on top of my pending pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to