just to add A stronger definition of real time. The engineering definition of real time is roughly fast enough to be interactive
However, I put a stronger definition. In real time application or data, there is nothing as an answer which is supposed to be late and correct. The timeliness is part of the application.if I get the right answer too slowly it becomes useless or wrong Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Wed, 28 May 2025 at 11:10, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > The current limitations in SSS come from micro-batching.If you are going > to reduce micro-batching, this reduction must be balanced against the > available processing capacity of the cluster to prevent back pressure and > instability. In the case of Continuous Processing mode, a specific > continuous trigger with a desired checkpoint interval quote > > " > df.writeStream > .format("...") > .option("...") > .trigger(Trigger.RealTime(“300 Seconds”)) // new trigger type to > enable real-time Mode > .start() > This Trigger.RealTime signals that the query should run in the new ultra > low-latency execution mode. A time interval can also be specified, e.g. > “300 Seconds”, to indicate how long each micro-batch should run for. > " > > will inevitably depend on many factors. Not that simple > HTH > > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Wed, 28 May 2025 at 05:13, Jerry Peng <jerry.boyang.p...@gmail.com> > wrote: > >> Hi all, >> >> I want to start a discussion thread for the SPIP titled “Real-Time Mode >> in Apache Spark Structured Streaming” that I've been working on with Siying >> Dong, Indrajit Roy, Chao Sun, Jungtaek Lim, and Michael Armbrust: [JIRA >> <https://issues.apache.org/jira/browse/SPARK-52330>] [Doc >> <https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing> >> ]. >> >> The SPIP proposes a new execution mode called “Real-time Mode” in Spark >> Structured Streaming that significantly lowers end-to-end latency for >> processing streams of data. >> >> A key principle of this proposal is compatibility. Our goal is to make >> Spark capable of handling streaming jobs that need results almost >> immediately (within O(100) milliseconds). We want to achieve this without >> changing the high-level DataFrame/Dataset API that users already use – so >> existing streaming queries can run in this new ultra-low-latency mode by >> simply turning it on, without rewriting their logic. >> >> In short, we’re trying to enable Spark to power real-time applications >> (like instant anomaly alerts or live personalization) that today cannot >> meet their latency requirements with Spark’s current streaming engine. >> >> We'd greatly appreciate your feedback, thoughts, and suggestions on this >> approach! >> >>