[GitHub] [beam] ajamato commented on a change in pull request #12852: [BEAM-10890] Log error counts to debug BigQuery streaming insert requ…

GitBox Wed, 20 Jan 2021 15:36:43 -0800


ajamato commented on a change in pull request #12852:
URL: https://github.com/apache/beam/pull/12852#discussion_r561370091




##########
File path: 
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java
##########
@@ -31,6 +31,7 @@
 
 /** HttpRequestInitializer for recording request to response latency of 
Http-based API calls. */
 public class LatencyRecordingHttpRequestInitializer implements 
HttpRequestInitializer {
+  // TODO: Import the URN from MonitoringInfoConstants.Urns when it's available
   public static final String HISTOGRAM_URN = " 
beam:metric:io:api_request_latencies:v1";

Review comment:
       Since you are instantiing the latency monitoring info, please add a 
MonitoringInfoSpec for it, and pull it its urn as the constant here
   
https://github.com/apache/beam/blob/eca935bc36036dd11c5f2bd20052b08049375bfc/model/pipeline/src/main/proto/metrics.proto#L69
   
   This would be ideal, as we should not instantiate metrics for URNs without a 
MonitoringInfoSpec.
   
   
   It should be very similar to the API_REQUEST_COUNT, without STATUS label, a 
different URN, and a histogram type
   
   i.e. you can use
   
       API_REQUEST_LATENCIES = 20 [(monitoring_info_spec) = {
         urn: "beam:metric:io:api_request_latencies:v1",
         type: "beam:metrics:histogram_int64:v1",
         required_labels: [
           "SERVICE",
           "METHOD",
           "RESOURCE",
           "PTRANSFORM"
         ],
         annotations: [
            {
               key: "description",
               value: "Histogram counts for request latencies made to an IO’s 
service APIs to batch read or write elements."
           }, 
           {
             key: "units",
             value: "Milliseconds"
           },
           {
             key: "process_metric",  // Should be reported as a process metric
                                     // instead of a bundle metric
             value: "true"
           }
         ]
       }];
   

##########
File path: 
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java
##########
@@ -31,6 +31,7 @@
 
 /** HttpRequestInitializer for recording request to response latency of 
Http-based API calls. */
 public class LatencyRecordingHttpRequestInitializer implements 
HttpRequestInitializer {
+  // TODO: Import the URN from MonitoringInfoConstants.Urns when it's available
   public static final String HISTOGRAM_URN = " 
beam:metric:io:api_request_latencies:v1";

Review comment:
       Remove space from start of string.
   
   Please call this API_REQUEST_LATENCIES_URN latencies. Rather than 
HISTOGRAM_URN
   

##########
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java
##########
@@ -411,20 +412,23 @@ public int hashCode() {
   }
 
   /**
-   * Match a MetricName with a given namespace and a name. If the namespace or 
the name is null, it
-   * will be ignored for the match.
+   * Match a MetricName with a given metric filter. If the metric filter is 
null, the method always
+   * returns true.
    */
   private boolean matchMetricName(
-      MetricName metricName, @Nullable String namespace, @Nullable String 
name) {
-    return (namespace == null || namespace.equals(metricName.getNamespace()))
-        && (name == null || name.equals(metricName.getName()));
+      MetricName metricName, @Nullable Set<KV<String, String>> metricFilter) {

Review comment:
       HashSet<MetricName> metricFilter
   I think it would make more sense to make the metricFilter just be a Set of 
MetricNames as well.  MetricName has a hash code so you can do efficient set 
lookups this way as well. It is used with Map types as a key normally, so this 
should work.
   
   Also, make the metric  you are defining uses a MonitoringInfoMetricName, 
which doesn't have a name+namespace concept, it instead has a urn and labels. 
Instantiate a MonitoringInfoMetricName for it using the full URN and labels. 
Don't parse the URN and put it in a MetricName as a namespace+name
   
   
https://github.com/apache/beam/blob/a72460272354747a54449358f5df414be4b6d72c/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java#L105
   
   
   
   

##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -70,6 +73,15 @@
   /** Tracks bytes written, exposed as "ByteCount" Counter. */
   private Counter byteCounter = SinkMetrics.bytesWritten();
 
+  private Set<KV<String, String>> metricFilter =

Review comment:
       I feel like KV isn't an appropriate type here. Isn't there a built in 
Pair or Tuple we could use in Java instead?
   As KV is designed for pcollections. I see a 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair being used 
in the repo:
   https://github.com/apache/beam/search?q=%22Pair%3C%22
   
   

##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -70,6 +73,15 @@
   /** Tracks bytes written, exposed as "ByteCount" Counter. */
   private Counter byteCounter = SinkMetrics.bytesWritten();
 
+  private Set<KV<String, String>> metricFilter =
+      ImmutableSet.of(
+          KV.of(
+              LatencyRecordingHttpRequestInitializer.HISTOGRAM_URN.split(":", 
2)[0],

Review comment:
       It looks like you are parsing the URN and treating it as if it were the 
"namespace" and "name" of the metric.
   
   However, this metric does not have a namespace+name, and should not be 
modeled as such.
   
   Remember, namespace+name is a concept that only applies to user defined 
metrics. The namespace and name are simply labels populated with the keys 
"namespace" and "name"
   See the MonitoringInfoSpec on USER_SUM_INT64
   
https://github.com/apache/beam/blob/eca935bc36036dd11c5f2bd20052b08049375bfc/model/pipeline/src/main/proto/metrics.proto#L69
   
   This metric should be identified using the urn, and it's labels.
   Instantiate a MonitoringInfoMetricName for it instead, and update the 
comparison logic to just filter using MetricNames (MonitoringInfoMetricName is 
a subclass of MetricName).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] ajamato commented on a change in pull request #12852: [BEAM-10890] Log error counts to debug BigQuery streaming insert requ…

Reply via email to