[ 
https://issues.apache.org/jira/browse/BEAM-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584706#comment-15584706
 ] 

Amit Sela commented on BEAM-757:
--------------------------------

I have something (working!) using the {{SimpleDoFnRunner}} here: 
https://github.com/amitsela/incubator-beam/blob/BEAM-757-WIP/runners/spark/src/main/java/org/apache/beam/runners/spark/util/SparkDoFnRunner.java

I had to expose OldDoFn and OutputManager for that 
(https://github.com/amitsela/incubator-beam/blob/BEAM-757-WIP/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java).

As for OldDoFn - I had to call setup() and teardown() - the DoFnRunner didn't 
seem to do that, maybe it should ? (Spark might need to override this anyway 
for teardown after finishBundle, but still).

OutputManager needed to be exposed to allow the runner to access the output, 
and in Spark's case clear it in-between element processing because Spark 
partitions (~bundles) can be quite big.

Other then that, it was pretty straight forward for me to use it. I'll PR once 
pending PRs are merged.

> The SparkRunner should utilize the SDK's DoFnRunner instead of writing it's 
> own.
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-757
>                 URL: https://issues.apache.org/jira/browse/BEAM-757
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>
> The SDK now provides DoFnRunner implementations, and so to avoid maintaining 
> against the SDK, the runner should leverage the runner API instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to