[ https://issues.apache.org/jira/browse/BEAM-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510576#comment-15510576 ]
ASF GitHub Bot commented on BEAM-610: ------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/909 > Enable spark's checkpointing mechanism for driver-failure recovery in > streaming > ------------------------------------------------------------------------------- > > Key: BEAM-610 > URL: https://issues.apache.org/jira/browse/BEAM-610 > Project: Beam > Issue Type: Bug > Components: runner-spark > Reporter: Amit Sela > Assignee: Amit Sela > > For streaming applications, Spark provides a checkpoint mechanism useful for > stateful processing and driver failures. See: > https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#checkpointing > This requires the "lambdas", or the content of DStream/RDD functions to be > Serializable - currently, the runner a lot of the translation work in > streaming to the batch translator, which can no longer be the case because it > passes along non-serializables. > This also requires wrapping the creation of the streaming application's graph > in a "getOrCreate" manner. See: > https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#how-to-configure-checkpointing > Another limitation is the need to wrap Accumulators and Broadcast variables > in Singletons in order for them to be re-created once stale after recovery. > This work is a prerequisite to support PerKey workflows, which will be > support via Spark's stateful operators such as mapWithState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)