+1 on minimizing creating new stuff that will be deleted but if it gets us to that goal faster it can still be worthwhile.
On Thu, Mar 29, 2018 at 5:51 PM Robert Bradshaw <rober...@google.com> wrote: > If I understand correctly, this is something runner-specific that would > live solely on the runner side (i.e. over the Fn API we'd still have a > single name for operations rather than pushing this complexity into that > protocol as well which I'd really like to avoid, right?) If that's the > case, then it's a bit unclear what we'd be doing on the Python side, as all > the non-SDK worker code is going to be thrown away in the new world and I'd > like to avoid investing too much more there. > > On Wed, Mar 28, 2018 at 5:13 PM Pablo Estrada <pabl...@google.com> wrote: > >> Hello all, >> I've filed https://issues.apache.org/jira/browse/BEAM-3955, to consider >> the possibility of adding some sort of facility to translate different >> names for the runners. >> This is currently a problem in Dataflow, where steps can have different >> names in the backend and in the SDK. >> This is observable in Beam code, where different parts of the >> SDK/worker/runners use different names in their metrics: >> >> - Logging uses Beam transform names (e.g. Foo/Bar) >> - Metrics uses operation_name (e.g. s2) >> - Statesampler uses operation_name. >> - The Dataflow worker sets step_name to operation_name after creating the >> operation. >> >> I'd like to propose the following design outline: >> >> - Create an e*xecution context *that will allow runners to provide >> their specific functionality*.* >> - Execution context will be able to provide multiple runner-specific >> functionality (e.g. side input fetchers). >> - In this case, the execution contexts can have a StepNameRegistry, >> or StepRegistry, or StepMetadataRegistry of some kind, where step names >> and >> other metadata can be enrolled. >> - Runners can pass their execution contexts to operations, logging, >> and other modules. >> - Beam core can then switch to use Beam step names, and each runner's >> specific monitoring / metrics / etc classes can have their own logic for >> accessing these. >> - This would also allow us to remove the LoggingContext tracking, and >> rely only on statesampler for context tracking. >> >> Eventually, all of this should be fully contained in the portability API >> and runners won't have to deal with these issues, but for now it seems like >> a good compromise. >> >> If this sounds good, I'll start working to implement that. >> Note that this is only a rough description, and I'm open to reconsider >> any and all aspects. >> >> Best >> -P. >> -- >> Got feedback? go/pabloem-feedback >> <https://goto.google.com/pabloem-feedback> >> >