[ 
https://issues.apache.org/jira/browse/BEAM-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Chen updated BEAM-3566:
-------------------------------
    Fix Version/s: 2.4.0

> Replace Python DirectRunner apply_* hooks with PTransformOverrides
> ------------------------------------------------------------------
>
>                 Key: BEAM-3566
>                 URL: https://issues.apache.org/jira/browse/BEAM-3566
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0
>            Reporter: Charles Chen
>            Assignee: Charles Chen
>            Priority: Major
>             Fix For: 2.4.0
>
>
> In the Python DirectRunner, we currently use apply_* overrides to override 
> the operation of the default .expand() operation for certain transforms.  For 
> example, GroupByKey has a special implementation in the DirectRunner, so we 
> use an apply_* override hook to replace the implementation of 
> GroupByKey.expand().
> However, this strategy has drawbacks.  Because this override operation 
> happens eagerly during graph construction, the pipeline graph is specialized 
> and modified before a specific runner is bound to the pipeline's execution.  
> This makes the pipeline graph non-portable and blocks full migration to using 
> the Runner API pipeline representation in the DirectRunner.
> By contrast, the SDK's PTransformOverride mechanism allows the expression of 
> matchers that operate on the unspecialized graph, replacing PTransforms as 
> necessary to produce a DirectRunner-specialized pipeline graph for execution.
> We therefore want to replace these eager apply_* overrides with 
> PTransformOverrides that operate on the completely constructed graph.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to