Hello all, I've filed https://issues.apache.org/jira/browse/BEAM-3955, to consider the possibility of adding some sort of facility to translate different names for the runners. This is currently a problem in Dataflow, where steps can have different names in the backend and in the SDK. This is observable in Beam code, where different parts of the SDK/worker/runners use different names in their metrics:
- Logging uses Beam transform names (e.g. Foo/Bar) - Metrics uses operation_name (e.g. s2) - Statesampler uses operation_name. - The Dataflow worker sets step_name to operation_name after creating the operation. I'd like to propose the following design outline: - Create an e*xecution context *that will allow runners to provide their specific functionality*.* - Execution context will be able to provide multiple runner-specific functionality (e.g. side input fetchers). - In this case, the execution contexts can have a StepNameRegistry, or StepRegistry, or StepMetadataRegistry of some kind, where step names and other metadata can be enrolled. - Runners can pass their execution contexts to operations, logging, and other modules. - Beam core can then switch to use Beam step names, and each runner's specific monitoring / metrics / etc classes can have their own logic for accessing these. - This would also allow us to remove the LoggingContext tracking, and rely only on statesampler for context tracking. Eventually, all of this should be fully contained in the portability API and runners won't have to deal with these issues, but for now it seems like a good compromise. If this sounds good, I'll start working to implement that. Note that this is only a rough description, and I'm open to reconsider any and all aspects. Best -P. -- Got feedback? go/pabloem-feedback
