[ 
https://issues.apache.org/jira/browse/SPARK-54699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-54699:
--------------------------------------
    Description: 
Real-time mode for Apache Spark Structured Streaming is a new execution model 
designed to significantly lower end-to-end data processing latency to the order 
of 100 milliseconds.

 

This epic targets supporting stateful queries and pyspark support in RTM.

 

To support stateful queries we need to implement several major components:
 # Streaming Shuffle - this is a push based shuffle thats allows tasks from 
upstream stages to immediately send output to tasks from downstream stages so 
that data can be processed in a pipelined fashion.
 # Concurrent Stage scheduling capabilities - allow multiple stages of a query 
plan to be running at the same time so that processing can be done in a 
pipelined fashion in conjunction with the streaming shuffle.

 

Previous epic for stateless support in RTM:

https://issues.apache.org/jira/browse/SPARK-53736

 

More details can be found in the SPIP

[https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]

 

SPIP approved by the community:

[https://lists.apache.org/thread/k93gj0ko54kcslzkjwp95nqvjnkwcb63] 

  was:
Real-time mode for Apache Spark Structured Streaming is a new execution model 
designed to significantly lower end-to-end data processing latency to the order 
of 100 milliseconds.

 

This epic targets supporting stateful queries and pyspark support in RTM.  

 

Previous epic for stateless support in RTM:

https://issues.apache.org/jira/browse/SPARK-53736

 

More details can be found in the SPIP

[https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]

 

SPIP approved by the community:

[https://lists.apache.org/thread/k93gj0ko54kcslzkjwp95nqvjnkwcb63] 


> Real-time Mode in Structured Streaming (stateful support)
> ---------------------------------------------------------
>
>                 Key: SPARK-54699
>                 URL: https://issues.apache.org/jira/browse/SPARK-54699
>             Project: Spark
>          Issue Type: Epic
>          Components: Structured Streaming
>    Affects Versions: 4.3.0
>            Reporter: Boyang Jerry Peng
>            Priority: Major
>
> Real-time mode for Apache Spark Structured Streaming is a new execution model 
> designed to significantly lower end-to-end data processing latency to the 
> order of 100 milliseconds.
>  
> This epic targets supporting stateful queries and pyspark support in RTM.
>  
> To support stateful queries we need to implement several major components:
>  # Streaming Shuffle - this is a push based shuffle thats allows tasks from 
> upstream stages to immediately send output to tasks from downstream stages so 
> that data can be processed in a pipelined fashion.
>  # Concurrent Stage scheduling capabilities - allow multiple stages of a 
> query plan to be running at the same time so that processing can be done in a 
> pipelined fashion in conjunction with the streaming shuffle.
>  
> Previous epic for stateless support in RTM:
> https://issues.apache.org/jira/browse/SPARK-53736
>  
> More details can be found in the SPIP
> [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]
>  
> SPIP approved by the community:
> [https://lists.apache.org/thread/k93gj0ko54kcslzkjwp95nqvjnkwcb63] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to