Charles Chen created BEAM-3566:
----------------------------------

             Summary: Replace Python DirectRunner apply_* hooks with 
PTransformOverrides
                 Key: BEAM-3566
                 URL: https://issues.apache.org/jira/browse/BEAM-3566
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py-core
    Affects Versions: 2.2.0
            Reporter: Charles Chen
            Assignee: Charles Chen


In the Python DirectRunner, we currently use apply_* overrides to override the 
operation of the default .expand() operation for certain transforms.  For 
example, GroupByKey has a special implementation in the DirectRunner, so we use 
an apply_* override hook to replace the implementation of GroupByKey.expand().

However, this strategy has drawbacks.  Because this override operation happens 
eagerly during graph construction, the pipeline graph is specialized and 
modified before a specific runner is bound to the pipeline's execution.  This 
makes the pipeline graph non-portable and blocks full migration to using the 
Runner API pipeline representation in the DirectRunner.

By contrast, the SDK's PTransformOverride mechanism allows the expression of 
matchers that operate on the unspecialized graph, replacing PTransforms as 
necessary to produce a DirectRunner-specialized pipeline graph for execution.

We therefore want to replace these eager apply_* overrides with 
PTransformOverrides that operate on the completely constructed graph.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to