[ 
https://issues.apache.org/jira/browse/BEAM-11033?focusedWorklogId=562495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-562495
 ]

ASF GitHub Bot logged work on BEAM-11033:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Mar/21 17:45
            Start Date: 08/Mar/21 17:45
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on a change in pull request 
#14158:
URL: https://github.com/apache/beam/pull/14158#discussion_r589620958



##########
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
##########
@@ -249,14 +249,33 @@ private boolean isMetricTentative(MetricUpdate 
metricUpdate) {
      */
     private MetricKey getMetricHashKey(MetricUpdate metricUpdate) {
       String fullStepName = metricUpdate.getName().getContext().get("step");
-      if (dataflowPipelineJob.transformStepNames == null
-          || 
!dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) {
-        // If we can't translate internal step names to user step names, we 
just skip them
-        // altogether.
-        return null;
+
+      if (dataflowPipelineJob.getPipelineProto() != null

Review comment:
       nit: Might be readable to separate the conditionals to a separate method 
`getUniqueName` so that each of the `fullStepName = ...` lines can just be a 
`return`

##########
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
##########
@@ -140,6 +143,24 @@ public DataflowPipelineJob(
     this.dataflowMetrics = new DataflowMetrics(this, this.dataflowClient);
   }
 
+  /**
+   * Constructs the job.
+   *
+   * @param jobId the job id
+   * @param dataflowOptions used to configure the client for the Dataflow 
Service
+   * @param transformStepNames a mapping from AppliedPTransforms to Step Names
+   * @param pipelineProto Runner API pipeline proto.
+   */
+  public DataflowPipelineJob(
+      DataflowClient dataflowClient,
+      String jobId,
+      DataflowPipelineOptions dataflowOptions,
+      Map<AppliedPTransform<?, ?, ?>, String> transformStepNames,
+      RunnerApi.Pipeline pipelineProto) {

Review comment:
       `@Nullable`

##########
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowMetrics.java
##########
@@ -249,14 +249,33 @@ private boolean isMetricTentative(MetricUpdate 
metricUpdate) {
      */
     private MetricKey getMetricHashKey(MetricUpdate metricUpdate) {
       String fullStepName = metricUpdate.getName().getContext().get("step");
-      if (dataflowPipelineJob.transformStepNames == null
-          || 
!dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) {
-        // If we can't translate internal step names to user step names, we 
just skip them
-        // altogether.
-        return null;
+
+      if (dataflowPipelineJob.getPipelineProto() != null
+          && dataflowPipelineJob
+              .getPipelineProto()
+              .getComponents()
+              .getTransformsMap()
+              .containsKey(fullStepName)) {
+        // Dataflow Runner v2 with portable job submission uses proto 
transform map
+        // IDs for step names. Hence we lookup user step names based on the 
proto.
+        fullStepName =
+            dataflowPipelineJob
+                .getPipelineProto()
+                .getComponents()
+                .getTransformsMap()
+                .get(fullStepName)
+                .getUniqueName();
+      } else {
+        if (dataflowPipelineJob.transformStepNames == null
+            || 
!dataflowPipelineJob.transformStepNames.inverse().containsKey(fullStepName)) {
+          // If we can't translate internal step names to user step names, we 
just skip them
+          // altogether.
+          return null;

Review comment:
       Add `@Nullable` to return value.

##########
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
##########
@@ -121,6 +122,8 @@
 
   private @Nullable String latestStateString;
 
+  private RunnerApi.Pipeline pipelineProto = null;

Review comment:
       `final @Nullable`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 562495)
    Time Spent: 5h 20m  (was: 5h 10m)

> Update Dataflow metrics processor to handle portable jobs
> ---------------------------------------------------------
>
>                 Key: BEAM-11033
>                 URL: https://issues.apache.org/jira/browse/BEAM-11033
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Chamikara Madhusanka Jayalath
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P1
>             Fix For: 2.29.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently, Dataflow metrics processor expects Dataflow internal step names 
> generated for v1beta3 job description in metrics returned by Dataflow 
> service: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py#L97]
>  
> But with portable job submission, Dataflow uses PTransform ID (in proto 
> pipeline) as the internal step name. Hence metrics processor should be 
> updated to handle this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to