Re: Adding a StepMetadataRegistry for Python SDK

Lukasz Cwik Fri, 30 Mar 2018 10:09:11 -0700

+1 on minimizing creating new stuff that will be deleted but if it gets us
to that goal faster it can still be worthwhile.


On Thu, Mar 29, 2018 at 5:51 PM Robert Bradshaw <rober...@google.com> wrote:

> If I understand correctly, this is something runner-specific that would
> live solely on the runner side (i.e. over the Fn API we'd still have a
> single name for operations rather than pushing this complexity into that
> protocol as well which I'd really like to avoid, right?) If that's the
> case, then it's a bit unclear what we'd be doing on the Python side, as all
> the non-SDK worker code is going to be thrown away in the new world and I'd
> like to avoid investing too much more there.
>
> On Wed, Mar 28, 2018 at 5:13 PM Pablo Estrada <pabl...@google.com> wrote:
>
>> Hello all,
>> I've filed https://issues.apache.org/jira/browse/BEAM-3955, to consider
>> the possibility of adding some sort of facility to translate different
>> names for the runners.
>> This is currently a problem in Dataflow, where steps can have different
>> names in the backend and in the SDK.
>> This is observable in Beam code, where different parts of the
>> SDK/worker/runners use different names in their metrics:
>>
>> - Logging uses Beam transform names (e.g. Foo/Bar)
>> - Metrics uses operation_name (e.g. s2)
>> - Statesampler uses operation_name.
>> - The Dataflow worker sets step_name to operation_name after creating the
>> operation.
>>
>> I'd like to propose the following design outline:
>>
>>    - Create an e*xecution context *that will allow runners to provide
>>    their specific functionality*.*
>>    - Execution context will be able to provide multiple runner-specific
>>    functionality (e.g. side input fetchers).
>>    - In this case, the execution contexts can have a StepNameRegistry,
>>    or StepRegistry, or StepMetadataRegistry of some kind, where step names 
>> and
>>    other metadata can be enrolled.
>>    - Runners can pass their execution contexts to operations, logging,
>>    and other modules.
>>    - Beam core can then switch to use Beam step names, and each runner's
>>    specific monitoring / metrics / etc classes can have their own logic for
>>    accessing these.
>>    - This would also allow us to remove the LoggingContext tracking, and
>>    rely only on statesampler for context tracking.
>>
>> Eventually, all of this should be fully contained in the portability API
>> and runners won't have to deal with these issues, but for now it seems like
>> a good compromise.
>>
>> If this sounds good, I'll start working to implement that.
>> Note that this is only a rough description, and I'm open to reconsider
>> any and all aspects.
>>
>> Best
>> -P.
>> --
>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: Adding a StepMetadataRegistry for Python SDK

Reply via email to