[
https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Burke reassigned BEAM-6372:
----------------------------------
Assignee: (was: Robert Burke)
> Direct Runner should marshal data in a similar way to Dataflow runner
> ---------------------------------------------------------------------
>
> Key: BEAM-6372
> URL: https://issues.apache.org/jira/browse/BEAM-6372
> Project: Beam
> Issue Type: Improvement
> Components: runner-direct, sdk-go
> Reporter: Andrew Brampton
> Priority: Major
>
> I would test my pipeline using the direct runner, and it would happily run on
> a sample. When I ran it on the Dataflow runner, it'll run for a hour, then
> get to a stage that would crash like so:
>
> {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Error received from SDK harness for instruction -224: execute failed: panic:
> reflect: Call using main.HistogramResult as type struct \{ Key string
> "json:\"key\""; Files []string "json:\"files\""; Histogram
> palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R
> uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70
> [running]:{quote}
> This was because I forgot to register my HistogramResult type.
> It would be useful if the direct runner tried to marshal and unmarshal all
> types, to help expose issues like this earlier.
> Also, when running on Dataflow, the value of flags, and captured variables,
> would be the empty/default value. It would be good if direct also caused this
> behaviour. For example:
> {code}
> prefix := “X”
> s = s.Scope(“Prefix ” + prefix)
> c = beam.ParDo(s, func(value string) string {
> return prefix + value
> }, c)
> {code}
> Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow.
> Subtle behaviour, but I suspect the direct runner could expose this if it
> marshalled and unmarshalled the func like the dataflow runner.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)