[ 
https://issues.apache.org/jira/browse/FLINK-22672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379635#comment-17379635
 ] 

Jin Xing commented on FLINK-22672:
----------------------------------

>  This is great to hear. I am looking forward to this shuffle service 
>implementation
 
Thanks a lot [~trohrmann] for encouragement ~
We hope Flink users, especially for batch processing scenario in production, 
could benefit from Lattice (remote shuffle service for Flink).
 
These days our team is discussing where is the proper place to open source 
Lattice ?
 
One option is to apply and open source as an affiliated project of Flink at 
[https://github.com/apache|https://github.com/apache,] like flink-ml, 
flink-statefun – –  Lattice is based on 'Pluggable Shuffle Service' and can 
work as an independent plugin. Users from batch processing scenario could 
achieve better performance and stability in production environment by 
leveraging Lattice. Being public under Apache, we believe Lattice could get 
more attention, feedbacks and contributions from community.
 
Another option is to open source under our company's repo 
[https://github.com/alibaba|https://github.com/alibaba,]. 
 
What do you think ? Which one is preferred from your side ? We would like to 
hear your advice and really appreciate your help on this. If opening source 
under Apache is recommended, what's the proper following steps ?
 

> Some enhancements for pluggable shuffle service framework
> ---------------------------------------------------------
>
>                 Key: FLINK-22672
>                 URL: https://issues.apache.org/jira/browse/FLINK-22672
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: Jin Xing
>            Priority: Major
>             Fix For: 1.14.0
>
>
> "Pluggable shuffle service" in Flink provides an architecture which are 
> unified for both streaming and batch jobs, allowing user to customize the 
> process of data transfer between shuffle stages according to scenarios.
> There are already a number of implementations of "remote shuffle service" on 
> Spark like [1][2][3]. Remote shuffle enables to shuffle data from/to a remote 
> cluster and achieves benefits like :
>  # The lifecycle of computing resource can be decoupled with shuffle data, 
> once computing task is finished, idle computing nodes can be released with 
> its completed shuffle data accommodated on remote shuffle cluster.
>  # There is no need to reserve disk capacity for shuffle on computing nodes. 
> Remote shuffle cluster serves shuffling request with better scaling ability 
> and alleviates the local disk pressure on computing nodes when data skew.
> Based on "pluggable shuffle service", we build our own "remote shuffle 
> service" on Flink –- Lattice, which targets to provide functionalities and 
> improve performance for batch processing jobs. Basically it works as below:
>  # Lattice cluster works as an independent service for shuffling request;
>  # LatticeShuffleMaster extends ShuffleMaster, works inside JM and talks with 
> remote Lattice cluster for shuffle resource application and shuffle data 
> lifecycle management;
>  # LatticeShuffleEnvironment extends ShuffleEnvironment, works inside TM and 
> provides an environment for shuffling data from/to remote Lattice cluster;
> During the process of building Lattice we find some potential enhancements on 
> "pluggable shuffle service". I will enumerate and create some sub JIRAs under 
> this umbrella
>  
> [1] 
> [https://www.alibabacloud.com/blog/emr-remote-shuffle-service-a-powerful-elastic-tool-of-serverless-spark_597728]
> [2] [https://bestoreo.github.io/post/cosco/cosco/]
> [3] [https://github.com/uber/RemoteShuffleService]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to