Thanks Kenn.  We already do the Runner API roundtripping (I believe Robert
implemented this).  With this change, we would start doing exactly what
you're suggesting, where we apply overrides to a post-deserialization
pipeline.

On Thu, Feb 1, 2018 at 6:45 PM Kenneth Knowles <k...@google.com> wrote:

> +1 for removing apply_*
>
> For the Java SDK, removing specialized intercepts was an important first
> step towards the portability framework. I wonder if there is a way for the
> Python SDK to leapfrog, taking advantage of some of the lessons that Java
> learned a bit more painfully. Most pertinent I think is that if an SDK's
> role is to construct a pipeline and ship the proto to a runner (service)
> then overrides apply to a post-deserialization pipeline. The Java
> DirectRunner does a proto round-trip to avoid accidentally depending on
> things that are not really part of the pipeline. I would this crisp
> abstraction enforcement would add even more value to Python.
>
> Kenn
>
> On Thu, Feb 1, 2018 at 5:21 PM, Charles Chen <c...@google.com> wrote:
>
>> In the Python DirectRunner, we currently use apply_* overrides to
>> override the operation of the default .expand() operation for certain
>> transforms. For example, GroupByKey has a special implementation in the
>> DirectRunner, so we use an apply_* override hook to replace the
>> implementation of GroupByKey.expand().
>>
>> However, this strategy has drawbacks. Because this override operation
>> happens eagerly during graph construction, the pipeline graph is
>> specialized and modified before a specific runner is bound to the
>> pipeline's execution. This makes the pipeline graph non-portable and blocks
>> full migration to using the Runner API pipeline representation in the
>> DirectRunner.
>>
>> By contrast, the SDK's PTransformOverride mechanism allows the expression
>> of matchers that operate on the unspecialized graph, replacing PTransforms
>> as necessary to produce a DirectRunner-specialized pipeline graph for
>> execution.
>>
>> I therefore propose to replace these eager apply_* overrides with
>> PTransformOverrides that operate on the completely constructed graph.
>>
>> The JIRA issue is https://issues.apache.org/jira/browse/BEAM-3566, and
>> I've prepared a candidate patch at
>> https://github.com/apache/incubator-beam/pull/4529.
>>
>> Best,
>> Charles
>>
>
>

Reply via email to