[
https://issues.apache.org/jira/browse/BEAM-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Chen updated BEAM-3566:
-------------------------------
Fix Version/s: 2.4.0
> Replace Python DirectRunner apply_* hooks with PTransformOverrides
> ------------------------------------------------------------------
>
> Key: BEAM-3566
> URL: https://issues.apache.org/jira/browse/BEAM-3566
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Affects Versions: 2.2.0
> Reporter: Charles Chen
> Assignee: Charles Chen
> Priority: Major
> Fix For: 2.4.0
>
>
> In the Python DirectRunner, we currently use apply_* overrides to override
> the operation of the default .expand() operation for certain transforms. For
> example, GroupByKey has a special implementation in the DirectRunner, so we
> use an apply_* override hook to replace the implementation of
> GroupByKey.expand().
> However, this strategy has drawbacks. Because this override operation
> happens eagerly during graph construction, the pipeline graph is specialized
> and modified before a specific runner is bound to the pipeline's execution.
> This makes the pipeline graph non-portable and blocks full migration to using
> the Runner API pipeline representation in the DirectRunner.
> By contrast, the SDK's PTransformOverride mechanism allows the expression of
> matchers that operate on the unspecialized graph, replacing PTransforms as
> necessary to produce a DirectRunner-specialized pipeline graph for execution.
> We therefore want to replace these eager apply_* overrides with
> PTransformOverrides that operate on the completely constructed graph.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)