[
https://issues.apache.org/jira/browse/BEAM-11033?focusedWorklogId=562495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-562495
]
ASF GitHub Bot logged work on BEAM-11033:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Mar/21 17:45
Start Date: 08/Mar/21 17:45
Worklog Time Spent: 10m
Work Description: kennknowles commented on a change in pull request
#14158:
URL: https://github.com/apache/beam/pull/14158#discussion_r589620958
##########
File path:
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
##########
@@ -249,14 +249,33 @@ private boolean isMetricTentative(MetricUpdate
metricUpdate) {
*/
private MetricKey getMetricHashKey(MetricUpdate metricUpdate) {
String fullStepName = metricUpdate.getName().getContext().get("step");
- if (dataflowPipelineJob.transformStepNames == null
- ||
!dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) {
- // If we can't translate internal step names to user step names, we
just skip them
- // altogether.
- return null;
+
+ if (dataflowPipelineJob.getPipelineProto() != null
Review comment:
nit: Might be readable to separate the conditionals to a separate method
`getUniqueName` so that each of the `fullStepName = ...` lines can just be a
`return`
##########
File path:
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
##########
@@ -140,6 +143,24 @@ public DataflowPipelineJob(
this.dataflowMetrics = new DataflowMetrics(this, this.dataflowClient);
}
+ /**
+ * Constructs the job.
+ *
+ * @param jobId the job id
+ * @param dataflowOptions used to configure the client for the Dataflow
Service
+ * @param transformStepNames a mapping from AppliedPTransforms to Step Names
+ * @param pipelineProto Runner API pipeline proto.
+ */
+ public DataflowPipelineJob(
+ DataflowClient dataflowClient,
+ String jobId,
+ DataflowPipelineOptions dataflowOptions,
+ Map<AppliedPTransform<?, ?, ?>, String> transformStepNames,
+ RunnerApi.Pipeline pipelineProto) {
Review comment:
`@Nullable`
##########
File path:
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
##########
@@ -249,14 +249,33 @@ private boolean isMetricTentative(MetricUpdate
metricUpdate) {
*/
private MetricKey getMetricHashKey(MetricUpdate metricUpdate) {
String fullStepName = metricUpdate.getName().getContext().get("step");
- if (dataflowPipelineJob.transformStepNames == null
- ||
!dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) {
- // If we can't translate internal step names to user step names, we
just skip them
- // altogether.
- return null;
+
+ if (dataflowPipelineJob.getPipelineProto() != null
+ && dataflowPipelineJob
+ .getPipelineProto()
+ .getComponents()
+ .getTransformsMap()
+ .containsKey(fullStepName)) {
+ // Dataflow Runner v2 with portable job submission uses proto
transform map
+ // IDs for step names. Hence we lookup user step names based on the
proto.
+ fullStepName =
+ dataflowPipelineJob
+ .getPipelineProto()
+ .getComponents()
+ .getTransformsMap()
+ .get(fullStepName)
+ .getUniqueName();
+ } else {
+ if (dataflowPipelineJob.transformStepNames == null
+ ||
!dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) {
+ // If we can't translate internal step names to user step names, we
just skip them
+ // altogether.
+ return null;
Review comment:
Add `@Nullable` to return value.
##########
File path:
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
##########
@@ -121,6 +122,8 @@
private @Nullable String latestStateString;
+ private RunnerApi.Pipeline pipelineProto = null;
Review comment:
`final @Nullable`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 562495)
Time Spent: 5h 20m (was: 5h 10m)
> Update Dataflow metrics processor to handle portable jobs
> ---------------------------------------------------------
>
> Key: BEAM-11033
> URL: https://issues.apache.org/jira/browse/BEAM-11033
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: Chamikara Madhusanka Jayalath
> Priority: P1
> Fix For: 2.29.0
>
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> Currently, Dataflow metrics processor expects Dataflow internal step names
> generated for v1beta3 job description in metrics returned by Dataflow
> service:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py#L97]
>
> But with portable job submission, Dataflow uses PTransform ID (in proto
> pipeline) as the internal step name. Hence metrics processor should be
> updated to handle this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)