Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

Mich Talebzadeh Wed, 28 May 2025 06:48:11 -0700

just to add

A stronger definition of real time. The engineering definition of real time
is roughly fast enough to be interactive


However, I put a stronger definition. In real time application or data,
there is nothing as an answer which is supposed to be late and correct. The
timeliness is part of the application.if I get the right answer too slowly
it becomes useless or wrong



Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Wed, 28 May 2025 at 11:10, Mich Talebzadeh <[email protected]>
wrote:

> The current limitations in SSS come from micro-batching.If you are going
> to reduce micro-batching, this reduction must be balanced against the
> available processing capacity of the cluster to prevent back pressure and
> instability. In the case of Continuous Processing mode, a specific
> continuous trigger with a desired checkpoint interval quote
>
> "
> df.writeStream
>    .format("...")
>    .option("...")
>    .trigger(Trigger.RealTime(“300 Seconds”))    // new trigger type to
> enable real-time Mode
>    .start()
> This Trigger.RealTime signals that the query should run in the new ultra
> low-latency execution mode.  A time interval can also be specified, e.g.
> “300 Seconds”, to indicate how long each micro-batch should run for.
> "
>
> will inevitably depend on many factors. Not that simple
> HTH
>
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Wed, 28 May 2025 at 05:13, Jerry Peng <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I want to start a discussion thread for the SPIP titled “Real-Time Mode
>> in Apache Spark Structured Streaming” that I've been working on with Siying
>> Dong, Indrajit Roy, Chao Sun, Jungtaek Lim, and Michael Armbrust: [JIRA
>> <https://issues.apache.org/jira/browse/SPARK-52330>] [Doc
>> <https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing>
>> ].
>>
>> The SPIP proposes a new execution mode called “Real-time Mode” in Spark
>> Structured Streaming that significantly lowers end-to-end latency for
>> processing streams of data.
>>
>> A key principle of this proposal is compatibility. Our goal is to make
>> Spark capable of handling streaming jobs that need results almost
>> immediately (within O(100) milliseconds). We want to achieve this without
>> changing the high-level DataFrame/Dataset API that users already use – so
>> existing streaming queries can run in this new ultra-low-latency mode by
>> simply turning it on, without rewriting their logic.
>>
>> In short, we’re trying to enable Spark to power real-time applications
>> (like instant anomaly alerts or live personalization) that today cannot
>> meet their latency requirements with Spark’s current streaming engine.
>>
>> We'd greatly appreciate your feedback, thoughts, and suggestions on this
>> approach!
>>
>>

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

Reply via email to