Thomas is working on this pretty explicitly. Beam needs this for the Runner/Fn APIs -- except, probably, the unique IDs will be numbers or hashes so that they are more useable than long strings.
The code to check whether names are unique, etc., is actually in the SDK core right now. See, e.g., https://github.com/apache/beam/blob/7984fe3fc20160d2286433434190f35658aef158/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java#L359 Dan On Sun, Jan 29, 2017 at 5:33 AM, Aviem Zur <[email protected]> wrote: > Hi all, > > While working on implementing metrics support in the Spark Runner a need > arose for composing a unique identifier of a transform, to differentiate it > from other transforms with the same name. > > With the help of @bjchambers I understood that something similar to this > exists in the Dataflow Runner which creates a string that is something > along the lines of > "PBegin/SomeInputTransform/SomeParDo/...MyTransform.# > Running_number_for_collisions". > > I'm trying to figure out: > A) How this is done in Dataflow runner. > B) Can be pulled up as a util for other runners, as conversation regarding > metrics API and querying is hinting this will be needed. > C) From my own forays into the code I came across > `org.apache.beam.sdk.values.PValue#getProducingTransformInternal` which > can > be recursed on but is marked as deprecated. Are there efforts being made > elsewhere for this sort of pipeline graph reflection? >
