Charles Chen created BEAM-3566:
----------------------------------
Summary: Replace Python DirectRunner apply_* hooks with
PTransformOverrides
Key: BEAM-3566
URL: https://issues.apache.org/jira/browse/BEAM-3566
Project: Beam
Issue Type: Improvement
Components: sdk-py-core
Affects Versions: 2.2.0
Reporter: Charles Chen
Assignee: Charles Chen
In the Python DirectRunner, we currently use apply_* overrides to override the
operation of the default .expand() operation for certain transforms. For
example, GroupByKey has a special implementation in the DirectRunner, so we use
an apply_* override hook to replace the implementation of GroupByKey.expand().
However, this strategy has drawbacks. Because this override operation happens
eagerly during graph construction, the pipeline graph is specialized and
modified before a specific runner is bound to the pipeline's execution. This
makes the pipeline graph non-portable and blocks full migration to using the
Runner API pipeline representation in the DirectRunner.
By contrast, the SDK's PTransformOverride mechanism allows the expression of
matchers that operate on the unspecialized graph, replacing PTransforms as
necessary to produce a DirectRunner-specialized pipeline graph for execution.
We therefore want to replace these eager apply_* overrides with
PTransformOverrides that operate on the completely constructed graph.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)