Hi all,

I want to start a discussion thread for the SPIP titled “Real-Time Mode in
Apache Spark Structured Streaming” that I've been working on with Siying
Dong, Indrajit Roy, Chao Sun, Jungtaek Lim, and Michael Armbrust: [JIRA
<https://issues.apache.org/jira/browse/SPARK-52330>] [Doc
<https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing>
].

The SPIP proposes a new execution mode called “Real-time Mode” in Spark
Structured Streaming that significantly lowers end-to-end latency for
processing streams of data.

A key principle of this proposal is compatibility. Our goal is to make
Spark capable of handling streaming jobs that need results almost
immediately (within O(100) milliseconds). We want to achieve this without
changing the high-level DataFrame/Dataset API that users already use – so
existing streaming queries can run in this new ultra-low-latency mode by
simply turning it on, without rewriting their logic.

In short, we’re trying to enable Spark to power real-time applications
(like instant anomaly alerts or live personalization) that today cannot
meet their latency requirements with Spark’s current streaming engine.

We'd greatly appreciate your feedback, thoughts, and suggestions on this
approach!

Reply via email to