[
https://issues.apache.org/jira/browse/SPARK-52330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boyang Jerry Peng updated SPARK-52330:
--------------------------------------
Description:
The SPIP proposes to add a new execution mode called “{*}Real-time Mode{*}” in
Spark Structured Streaming that significantly lowers end-to-end latency for
processing streams of data.
Our goal is to make Spark capable of handling streaming jobs that need results
*almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to
achieve this *without changing the high-level DataFrame/Dataset API* that users
already use – so existing streaming queries can run in this new
ultra-low-latency mode by simply turning it on, without rewriting their logic.
In short, we’re trying to enable Spark to power *real-time applications* (like
instant anomaly alerts or live personalization) that today cannot meet their
latency requirements with Spark’s current streaming engine.
SPIP doc:
[https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]
was:
We propose to add a *real-time mode* in Spark Structured Streaming that
significantly lowers end-to-end latency for processing streams of data.
Our goal is to make Spark capable of handling streaming jobs that need results
*almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to
achieve this *without changing the high-level DataFrame/Dataset API* that users
already use – so existing streaming queries can run in this new
ultra-low-latency mode by simply turning it on, without rewriting their logic.
In short, we’re trying to enable Spark to power *real-time applications* (like
instant anomaly alerts or live personalization) that today cannot meet their
latency requirements with Spark’s current streaming engine.
SPIP doc:
[https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]
> SPIP: Real-Time Mode in Apache Spark Structured Streaming
> ---------------------------------------------------------
>
> Key: SPARK-52330
> URL: https://issues.apache.org/jira/browse/SPARK-52330
> Project: Spark
> Issue Type: Umbrella
> Components: Structured Streaming
> Affects Versions: 4.1.0
> Reporter: Boyang Jerry Peng
> Priority: Major
>
> The SPIP proposes to add a new execution mode called “{*}Real-time Mode{*}”
> in Spark Structured Streaming that significantly lowers end-to-end latency
> for processing streams of data.
> Our goal is to make Spark capable of handling streaming jobs that need
> results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want
> to achieve this *without changing the high-level DataFrame/Dataset API* that
> users already use – so existing streaming queries can run in this new
> ultra-low-latency mode by simply turning it on, without rewriting their logic.
> In short, we’re trying to enable Spark to power *real-time applications*
> (like instant anomaly alerts or live personalization) that today cannot meet
> their latency requirements with Spark’s current streaming engine.
>
> SPIP doc:
> [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]