[ 
https://issues.apache.org/jira/browse/SPARK-56664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-56664:
--------------------------------------
    Description: 
The streaming shuffle is an alternative {{ShuffleManager}} implementation 
designed for low-latency, continuously-running queries (for example, real-time 
mode in Structured Streaming). Unlike the default sort-based shuffle, it does 
not materialize map outputs to disk and does not require map tasks to finish 
before reduce tasks can start. Instead, each map task hosts a network server 
that pushes records to reduce tasks as they are produced; reduce tasks open 
clients to those servers and consume records as a stream.

 

Design doc:

 

https://github.com/jerrypeng/spark/blob/1fe0abd72a317e8f73df1406966f8f49b24e8fd1/docs/streaming-shuffle.md

  was:
The streaming shuffle is an alternative {{ShuffleManager}} implementation 
designed for low-latency, continuously-running queries (for example, real-time 
mode in Structured Streaming). Unlike the default sort-based shuffle, it does 
not materialize map outputs to disk and does not require map tasks to finish 
before reduce tasks can start. Instead, each map task hosts a network server 
that pushes records to reduce tasks as they are produced; reduce tasks open 
clients to those servers and consume records as a stream.

 

Design doc:

 

https://github.com/jerrypeng/spark/blob/ecb48211f821842caf717ebdf069da5505f43f68/docs/streaming-shuffle.md


> Add a new shuffle implementation to support Real-time Mode (RTM)
> ----------------------------------------------------------------
>
>                 Key: SPARK-56664
>                 URL: https://issues.apache.org/jira/browse/SPARK-56664
>             Project: Spark
>          Issue Type: Epic
>          Components: Structured Streaming
>    Affects Versions: 4.2.0
>            Reporter: Boyang Jerry Peng
>            Priority: Major
>
> The streaming shuffle is an alternative {{ShuffleManager}} implementation 
> designed for low-latency, continuously-running queries (for example, 
> real-time mode in Structured Streaming). Unlike the default sort-based 
> shuffle, it does not materialize map outputs to disk and does not require map 
> tasks to finish before reduce tasks can start. Instead, each map task hosts a 
> network server that pushes records to reduce tasks as they are produced; 
> reduce tasks open clients to those servers and consume records as a stream.
>  
> Design doc:
>  
> https://github.com/jerrypeng/spark/blob/1fe0abd72a317e8f73df1406966f8f49b24e8fd1/docs/streaming-shuffle.md



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to