[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860643#comment-16860643
 ] 

Yingjie Cao commented on FLINK-12070:
-------------------------------------

[~StephanEwen] Last week, I have implemented several new ways of storing the 
data flowing your suggestions, including:
 # spill to file directly (based on the first commit before the bounded 
blocking subpartition commit).
 # sync write using filechannel.write method (based on the bounded blocking 
subpartition commit), and map the region when 
BufferToByteBuffer.Writer.complete method is called.
 # async write using IOManager (based on the bounded blocking subpartition 
commit), and map the region when BufferToByteBuffer.Writer.complete method is 
called (may be map when writing finishes is better).

The test is still running and needs two or three days to finish, but from the 
partial results, all the above implementations incurs performance regression 
for some test case (data volume is medium and can be cached in memory by 
spillable subpartition). I'd like to rerun the benchmark if there are new 
implementations.

> Make blocking result partitions consumable multiple times
> ---------------------------------------------------------
>
>                 Key: FLINK-12070
>                 URL: https://issues.apache.org/jira/browse/FLINK-12070
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.9.0
>            Reporter: Till Rohrmann
>            Assignee: Stephan Ewen
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>         Attachments: image-2019-04-18-17-38-24-949.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to