This is arguably a bug in Dataflow's backend. The backend only knows about primitive operations (ParDo, Flatten, etc.), and doesn't currently model a PTransform as an independent entity. Rather it infers the existence of the PTransform based on the naming of the operations (i.e. if you have operations named a/b and a/c, you infer a PTransform named a containing b and c). This is how the Dataflow UI knows how to display composite transforms.
Should Google support PTransforms as a top-level object? Yes - as you noticed this is an easy way to trip up, and sometimes innocent-seeming refactoring can cause display data to get "lost." I'm not sure what the current priority of this bug is, and it may not be fixed until things are fully on portable pipelines. For now, I suggest putting display data on primitive operations. Reuven On Wed, May 12, 2021 at 7:10 AM Ismaël Mejía <[email protected]> wrote: > Running a pipeline on Dataflow I noticed it was not showing the 'display > data' of ParquetIO on the Dataflow UI, after digging deeper I found that > composite transforms are not shown on Dataflow. > > BEAM-366 Support Display Data on Composite Transforms > https://issues.apache.org/jira/browse/BEAM-366 > > I also noticed that for primitive transforms what is shown is not the > populateDisplayData code extended from PTransform but the > populateDisplayData method code implemented at the parametrizing function > level, concretely the DoFn or Source for the case of IOs. > > This of course surprised me because we have been implemented all these > methods in the wrong place (at the PTransform level) for years and ignoring > the function so they are not shown in the UI, so I was wondering: > > 1. Does Google plan to support displaying composite transforms (BEAM-366) > at some point? > > 2. If (1) is not happening soon, shall we refine all our > populateDisplayData implementations to be done only at the Function level > (DoFn, Source, WindowFn)? > > Since Open Source runners (Flink, Spark, etc) do not use DisplayData at > all I suppose we should keep this discussion at the Dataflow level only at > this time. > > I ignore how this is modeled on Portable Pipelines, is DisplayData part of > FunctionSpec to support the current use case? I saw that DisplayData is > considered at the PTransform level so this should cover the Composite case, > so I am curious if we are considering the parametrized function level > currently in use correctly for Portable pipelines. > >
