Re: Is implementing DisplayData on Beam Transforms worth?

Robert Bradshaw Thu, 13 May 2021 16:50:34 -0700

+1, definitely think display data more often belongs to composites
than leafs. Dataflow is moving to a model where it accepts Beam protos
directly; hopefully we can get that information to the UI.


On Thu, May 13, 2021 at 4:47 PM Valentyn Tymofieiev <[email protected]> wrote:
>
> I also happened to look at display data associated with Beam BigQuery IOs. In 
> my opinion,  for IO 'display data' bits to be useful, they need to be 
> visualized at the top-level (composite) transforms.  BQ IOs are one of the 
> most complex transforms in Beam and generate very involved graphs. Display 
> data information becomes too hard to find within the graph.
>
> On Wed, May 12, 2021 at 9:21 AM Reuven Lax <[email protected]> wrote:
>>
>> This is arguably a bug in Dataflow's backend. The backend only knows about 
>> primitive operations (ParDo, Flatten, etc.), and doesn't currently model a 
>> PTransform as an independent entity. Rather it infers the existence of the 
>> PTransform based on the naming of the operations (i.e. if you have 
>> operations named a/b and a/c, you infer a PTransform named a containing b 
>> and c). This is how the Dataflow UI knows how to display composite 
>> transforms.
>>
>> Should Google support PTransforms as a top-level object? Yes - as you 
>> noticed this is an easy way to trip up, and sometimes innocent-seeming 
>> refactoring can cause display data to get "lost." I'm not sure what the 
>> current priority of this bug is, and it may not be fixed until things are 
>> fully on portable pipelines. For now, I suggest putting display data on 
>> primitive operations.
>>
>> Reuven
>>
>> On Wed, May 12, 2021 at 7:10 AM Ismaël Mejía <[email protected]> wrote:
>>>
>>> Running a pipeline on Dataflow I noticed it was not showing the 'display 
>>> data' of ParquetIO on the Dataflow UI, after digging deeper I found that 
>>> composite transforms are not shown on Dataflow.
>>>
>>> BEAM-366 Support Display Data on Composite Transforms
>>> https://issues.apache.org/jira/browse/BEAM-366
>>>
>>> I also noticed that for primitive transforms what is shown is not the 
>>> populateDisplayData code extended from PTransform but the 
>>> populateDisplayData method code implemented at the parametrizing function 
>>> level, concretely the DoFn or Source for the case of IOs.
>>>
>>> This of course surprised me because we have been implemented all these 
>>> methods in the wrong place (at the PTransform level) for years and ignoring 
>>> the function so they are not shown in the UI, so I was wondering:
>>>
>>> 1. Does Google plan to support displaying composite transforms (BEAM-366) 
>>> at some point?
>>>
>>> 2. If (1) is not happening soon, shall we refine all our 
>>> populateDisplayData implementations to be done only at the Function level 
>>> (DoFn, Source, WindowFn)?
>>>
>>> Since Open Source runners (Flink, Spark, etc) do not use DisplayData at all 
>>> I suppose we should keep this discussion at the Dataflow level only at this 
>>> time.
>>>
>>> I ignore how this is modeled on Portable Pipelines, is DisplayData part of 
>>> FunctionSpec to support the current use case? I saw that DisplayData is 
>>> considered at the PTransform level so this should cover the Composite case, 
>>> so I am curious if we are considering the parametrized function level 
>>> currently in use correctly for Portable pipelines.
>>>

Re: Is implementing DisplayData on Beam Transforms worth?

Reply via email to