[
https://issues.apache.org/jira/browse/SPARK-25399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim updated SPARK-25399:
---------------------------------
Fix Version/s: (was: 3.0.0)
> Reusing execution threads from continuous processing for microbatch streaming
> can result in correctness issues
> --------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-25399
> URL: https://issues.apache.org/jira/browse/SPARK-25399
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.0
> Reporter: Mukul Murthy
> Assignee: Mukul Murthy
> Priority: Critical
> Labels: correctness
> Fix For: 2.4.0
>
>
> Continuous processing sets some thread local variables that, when read by a
> thread running a microbatch stream, may result in incorrect or no previous
> state being read and resulting in wrong answers. This was caught by a job
> running the StreamSuite tests, and only repros occasionally when the same
> threads are used.
> The issue is in StateStoreRDD.compute - when we compute currentVersion, we
> read from a thread local variable which is set by continuous processing
> threads. If this value is set, we then think we're on the wrong state version.
> I imagine very few people, if any, would run into this bug, because you'd
> have to use continuous processing and then microbatch processing in the same
> cluster. However, it can result in silent correctness issues, and it would be
> very difficult for someone to tell if they were impacted by this or not.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]