[
https://issues.apache.org/jira/browse/SPARK-54699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boyang Jerry Peng updated SPARK-54699:
--------------------------------------
Epic Name: Real-time Mode in Structured Streaming supporting stateful
queries (was: Real-time Mode in Structured Streaming (Scala stateful and
Pyspark support))
> Real-time Mode in Structured Streaming (stateful support)
> ---------------------------------------------------------
>
> Key: SPARK-54699
> URL: https://issues.apache.org/jira/browse/SPARK-54699
> Project: Spark
> Issue Type: Epic
> Components: Structured Streaming
> Affects Versions: 4.3.0
> Reporter: Boyang Jerry Peng
> Priority: Major
>
> Real-time mode for Apache Spark Structured Streaming is a new execution model
> designed to significantly lower end-to-end data processing latency to the
> order of 100 milliseconds.
>
> This epic targets supporting stateful queries in RTM.
>
> To support stateful queries we need to implement several major components:
> # Streaming Shuffle - this is a push based shuffle thats allows tasks from
> upstream stages to immediately send output to tasks from downstream stages so
> that data can be processed in a pipelined fashion.
> # Concurrent Stage scheduling capabilities - allow multiple stages of a
> query plan to be running at the same time so that processing can be done in a
> pipelined fashion in conjunction with the streaming shuffle.
>
> Previous epic for stateless support in RTM:
> https://issues.apache.org/jira/browse/SPARK-53736
>
> More details can be found in the SPIP
> [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]
>
> SPIP approved by the community:
> [https://lists.apache.org/thread/k93gj0ko54kcslzkjwp95nqvjnkwcb63]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]