[
https://issues.apache.org/jira/browse/SPARK-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tathagata Das resolved SPARK-10071.
-----------------------------------
Resolution: Fixed
Assignee: Shixiong Zhu
Fix Version/s: 1.5.1
1.6.0
1.4.2
> QueueInputDStream Should Allow Checkpointing
> --------------------------------------------
>
> Key: SPARK-10071
> URL: https://issues.apache.org/jira/browse/SPARK-10071
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.4.1, 1.5.0
> Reporter: Asim Jalis
> Assignee: Shixiong Zhu
> Fix For: 1.4.2, 1.6.0, 1.5.1
>
>
> I would like for https://issues.apache.org/jira/browse/SPARK-8630 to be
> reverted and that issue resolved as won’t fix, and for QueueInputDStream to
> revert to its old behavior of not throwing an exception if checkpointing is
> enabled.
> Why? The reason is that this fix which throws an exception if the DStream is
> being checkpointed breaks the primary use case for QueueInputDStream, which
> is testing. For example, the Spark Streaming documentation recommends using
> QueueInputDStream for testing.
> Why does throwing an exception if checkpointing is used break this class? The
> reason is that if I use windowing operations or updateStateByKey then the
> StreamingContext requires that I enable checkpointing. It throws an exception
> if I don’t enable checkpointing. But then if I enable checkpointing this
> class throws an exception saying that I cannot use checkpointing with the
> queue stream. The end result of this is that I cannot use QueueInputDStream
> to test windowing operations and updateStateByKey. It can only be used for
> trivial stateless DStreams.
> But would removing the exception-throwing logic make this code fragile? It
> should not. In the testing scenario the RDD that is passed into the
> QueueInputDStream is created through parallelize and it is checkpointable.
> But what about people who are using QueueInputDStream in non-testing
> scenarios with non-recoverable RDDs? Perhaps a warning suffices here that
> checkpointing will not be able to recover state if their RDDs are
> non-recoverable. Then it is up to them how they resolve this situation.
> Since right now we have no good way of determining if a QueueInputDStream
> contains RDDs that are recoverable or not, why not err on the side of leaving
> it to the user of the class to not expect recoverability, rather than forcing
> checkpointing.
> In conclusion: my recommendation would be to revert to the old behavior and
> to resolve this bug as won’t fix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]