[ https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on BEAM-6372 started by null. ---------------------------------- > Direct Runner should marshal data in a similar way to Dataflow runner > --------------------------------------------------------------------- > > Key: BEAM-6372 > URL: https://issues.apache.org/jira/browse/BEAM-6372 > Project: Beam > Issue Type: Improvement > Components: runner-direct, sdk-go > Reporter: Andrew Brampton > Priority: P3 > > I would test my pipeline using the direct runner, and it would happily run on > a sample. When I ran it on the Dataflow runner, it'll run for a hour, then > get to a stage that would crash like so: > > {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction -224: execute failed: panic: > reflect: Call using main.HistogramResult as type struct \{ Key string > "json:\"key\""; Files []string "json:\"files\""; Histogram > palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R > uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 > [running]:{quote} > This was because I forgot to register my HistogramResult type. > It would be useful if the direct runner tried to marshal and unmarshal all > types, to help expose issues like this earlier. > Also, when running on Dataflow, the value of flags, and captured variables, > would be the empty/default value. It would be good if direct also caused this > behaviour. For example: > {code} > prefix := “X” > s = s.Scope(“Prefix ” + prefix) > c = beam.ParDo(s, func(value string) string { > return prefix + value > }, c) > {code} > Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. > Subtle behaviour, but I suspect the direct runner could expose this if it > marshalled and unmarshalled the func like the dataflow runner. -- This message was sent by Atlassian Jira (v8.20.10#820010)