Scott Wegner created BEAM-2450:
----------------------------------
Summary: Transform names and named applications should not be null
or empty
Key: BEAM-2450
URL: https://issues.apache.org/jira/browse/BEAM-2450
Project: Beam
Issue Type: Bug
Components: beam-model, sdk-java-core, sdk-py
Reporter: Scott Wegner
Assignee: Frances Perry
Priority: Minor
Beam SDK allows setting the name of a transform [1] and also naming the
transform application [2]. If no name is specified on application, the name of
the transform is used. If no name is specified for the transform, the class
name is used.
The application name serves as metadata for the applied PTransforms in the
constructed graph. The are effectively extra display data (historically,
PTransform names predate display data). The names are used by runners for UI
and monitoring applications, such as the displayed pipeline graph in the
Dataflow Monitoring UI [3].
Currently there is no explicit validation on the specified application name.
The current behavior seems to be:
* null application names cause a NullPointerException at construction time.
* Specifying the empty string compiles and succeeds in the DirectRunner, but
causes strange behavior in Dataflow when rendering the graph in the UI. I have
not tested the behavior of other runners.
We should add explicit validation in the model on the specified transform name
and application name. I propose that we disallow null and empty names.
This is technically a breaking change as the SDK currently allows the empty
string, but only because it is under-specified. The upgrade path for any
pipelines broken by this change is simple: specify a non-empty name or fallback
to the default class name.
[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236
[2]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295
[3]
https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)