Running a pipeline on Dataflow I noticed it was not showing the 'display
data' of ParquetIO on the Dataflow UI, after digging deeper I found that
composite transforms are not shown on Dataflow.

BEAM-366 Support Display Data on Composite Transforms
https://issues.apache.org/jira/browse/BEAM-366

I also noticed that for primitive transforms what is shown is not the
populateDisplayData code extended from PTransform but the
populateDisplayData method code implemented at the parametrizing function
level, concretely the DoFn or Source for the case of IOs.

This of course surprised me because we have been implemented all these
methods in the wrong place (at the PTransform level) for years and ignoring
the function so they are not shown in the UI, so I was wondering:

1. Does Google plan to support displaying composite transforms (BEAM-366)
at some point?

2. If (1) is not happening soon, shall we refine all our
populateDisplayData implementations to be done only at the Function level
(DoFn, Source, WindowFn)?

Since Open Source runners (Flink, Spark, etc) do not use DisplayData at all
I suppose we should keep this discussion at the Dataflow level only at this
time.

I ignore how this is modeled on Portable Pipelines, is DisplayData part of
FunctionSpec to support the current use case? I saw that DisplayData is
considered at the PTransform level so this should cover the Composite case,
so I am curious if we are considering the parametrized function level
currently in use correctly for Portable pipelines.

Reply via email to