[
https://issues.apache.org/jira/browse/SPARK-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697996#comment-14697996
]
Asim Jalis commented on SPARK-8630:
-----------------------------------
I would like to recommend that this fix be reverted and the issue resolved as
won’t fix.
Why? The reason is that this fix which throws an exception if the DStream is
being checkpointed breaks the primary use case for QueueInputDStream, which is
testing. For example, the Spark Streaming documentation recommends using
QueueInputDStream for testing.
Why does throwing an exception if checkpointing is used break this class? The
reason is that if I use windowing operations or updateStateByKey then the
StreamingContext requires that I enable checkpointing. It throws an exception
if I don’t enable checkpointing. But then if I enable checkpointing this class
throws an exception saying that I cannot use checkpointing with the queue
stream. The end result of this is that I cannot use QueueInputDStream to test
windowing operations and updateStateByKey. It can only be used for trivial
stateless DStreams.
But would removing the exception-throwing logic make this code fragile? It
should not. In the testing scenario the RDD that is passed into the
QueueInputDStream is created through parallelize and it is checkpointable.
But what about people who are using QueueInputDStream in non-testing scenarios
with non-recoverable RDDs? Perhaps a warning suffices here that checkpointing
will not be able to recover state if their RDDs are non-recoverable. Then it is
up to them how they resolve this situation.
Since right now we have no good way of determining if a QueueInputDStream
contains RDDs that are recoverable or not, why not err on the side of leaving
it to the user of the class to not expect recoverability, rather than forcing
checkpointing.
In conclusion: my recommendation would be to revert to the old behavior and to
resolve this bug as won’t fix.
> Prevent from checkpointing QueueInputDStream
> --------------------------------------------
>
> Key: SPARK-8630
> URL: https://issues.apache.org/jira/browse/SPARK-8630
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Shixiong Zhu
> Assignee: Shixiong Zhu
> Fix For: 1.4.1, 1.5.0
>
>
> It's better to prevent from checkpointing QueueInputDStream rather than
> failing the application when recovering `QueueInputDStream`, so that people
> can find the issue as soon as possible. See SPARK-8553 for example.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]