Ben Chambers created BEAM-681:
---------------------------------

             Summary: DoFns should be serialized at apply time and deserialized 
when executing
                 Key: BEAM-681
                 URL: https://issues.apache.org/jira/browse/BEAM-681
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py
            Reporter: Ben Chambers
            Assignee: Frances Perry


1. Serializing DoFns at application time ensures that any modifications of 
fields within the DoFn after application do not accidentally pollute the 
execution. This mirrors the approach taken in Java to provide an approximation 
of lexical-closure (eg., you only need to know the state of the DoFn at the 
time it was applied, not afterwards, to understand its behavior).

2. Based on 1, the DIrectRunner should also be deserializing DoFns before 
running them, which should also detect other classes of errors such as using 
the pipeline object (which is not pickleable) within the DoFn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to