[ 
https://issues.apache.org/jira/browse/FLINK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhijiang updated FLINK-10653:
-----------------------------
    Summary: Fine-grained Shuffle System  (was: Propose of external shuffle 
service for batch jobs)

> Fine-grained Shuffle System
> ---------------------------
>
>                 Key: FLINK-10653
>                 URL: https://issues.apache.org/jira/browse/FLINK-10653
>             Project: Flink
>          Issue Type: New Feature
>          Components: Network
>    Affects Versions: 2.0.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Major
>
> This is the umbrella issue for improving batch shuffle.
>  
> The network shuffle behavior for batch jobs in Flink can be improved in two 
> dimensions:
>  
> *1. ShuffleService:* Shuffle service is used for transporting upstream’s 
> outputs to the downstream side. The {{TaskExecutor}} takes the role of 
> shuffle service unified for both stream and batch jobs currently.
>  * The output is consumable only when the upstream task finishes for blocking 
> mode. The {{TaskExecutor}} can not exit to release resources even though 
> there are no tasks running, because it should responsible for transferring 
> data via shuffle service. This has the bad effects in resource dynamic or 
> sensitive scenarios.
>  
> 2.  *ResultPartition:* there are two types of subpartitions currently for the 
> outputs. The {{PipelinedSubPartition}} is used for stream jobs and the 
> {{SpillableSubpartition}} is used for batch jobs in blocking mode.
>  * The {{SpillableSubpartition}} generates one separate persistent file for 
> every subpartition. This partition-hash mode is not IO friendly for special 
> scenarios, i.e. large number of subpartitions with little data in every 
> subpartition. And it may exceed the inode limits for large scale jobs.
>  * The persistent files would be deleted immediately after reading to 
> transport layer. If the downstream task fails during consumption, all the 
> related upstream tasks have to be restarted to reproduce the outputs.
>  
> We propose three overall changes:
>  * Mechanism and setting of external shuffle service
>  * Output sort&merge persistent files for new {{ResultPartitionType}}
>  * Delete persistent output files via notification or ttl
>  
> The detail design doc would be submitted later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to