Scott Wegner created BEAM-2450:
----------------------------------

             Summary: Transform names and named applications should not be null 
or empty
                 Key: BEAM-2450
                 URL: https://issues.apache.org/jira/browse/BEAM-2450
             Project: Beam
          Issue Type: Bug
          Components: beam-model, sdk-java-core, sdk-py
            Reporter: Scott Wegner
            Assignee: Frances Perry
            Priority: Minor


Beam SDK allows setting the name of a transform [1] and also naming the 
transform application [2]. If no name is specified on application, the name of 
the transform is used. If no name is specified for the transform, the class 
name is used.

The application name serves as metadata for the applied PTransforms in the 
constructed graph. The are effectively extra display data (historically, 
PTransform names predate display data). The names are used by runners for UI 
and monitoring applications, such as the displayed pipeline graph in the 
Dataflow Monitoring UI [3].

Currently there is no explicit validation on the specified application name. 
The current behavior seems to be:
* null application names cause a NullPointerException at construction time.
* Specifying the empty string compiles and succeeds in the DirectRunner, but 
causes strange behavior in Dataflow when rendering the graph in the UI. I have 
not tested the behavior of other runners.

We should add explicit validation in the model on the specified transform name 
and application name. I propose that we disallow null and empty names.

This is technically a breaking change as the SDK currently allows the empty 
string, but only because it is under-specified. The upgrade path for any 
pipelines broken by this change is simple: specify a non-empty name or fallback 
to the default class name.

[1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236
[2] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295
[3] 
https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to