Andrew Brampton created BEAM-6372:
-------------------------------------

             Summary: Direct Runner should marshal data in a similar way to 
Dataflow runner
                 Key: BEAM-6372
                 URL: https://issues.apache.org/jira/browse/BEAM-6372
             Project: Beam
          Issue Type: Improvement
          Components: sdk-go
            Reporter: Andrew Brampton
            Assignee: Robert Burke


I would test my pipeline using the direct runner, and it would happily run on a 
sample. When I ran it on the Dataflow runner, it'll run for a hour, then get to 
a stage that would crash like so:
 
{quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
Error received from SDK harness for instruction -224: execute failed: panic: 
reflect: Call using main.HistogramResult as type struct \{ Key string 
"json:\"key\""; Files []string "json:\"files\""; Histogram 
palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R 
uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 
[running]:{quote}

This was because I forgot to register my HistogramResult type.

It would be useful if the direct runner tried to marshal and unmarshal all 
types, to help expose issues like this earlier.

Also, when running on Dataflow, the value of flags, and captured variables, 
would be the empty/default value. It would be good if direct also caused this 
behaviour. For example:

{code}
prefix := “X”
s = s.Scope(“Prefix ” + prefix)
c = beam.ParDo(s, func(value string) string {
        return prefix + value
}, c)
{code}

Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. 
Subtle behaviour, but I suspect the direct runner could expose this if it 
marshalled and unmarshalled the func like the dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to