ajamato commented on a change in pull request #12852:
URL: https://github.com/apache/beam/pull/12852#discussion_r561370091
##########
File path:
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java
##########
@@ -31,6 +31,7 @@
/** HttpRequestInitializer for recording request to response latency of
Http-based API calls. */
public class LatencyRecordingHttpRequestInitializer implements
HttpRequestInitializer {
+ // TODO: Import the URN from MonitoringInfoConstants.Urns when it's available
public static final String HISTOGRAM_URN = "
beam:metric:io:api_request_latencies:v1";
Review comment:
Since you are instantiing the latency monitoring info, please add a
MonitoringInfoSpec for it, and pull it its urn as the constant here
https://github.com/apache/beam/blob/eca935bc36036dd11c5f2bd20052b08049375bfc/model/pipeline/src/main/proto/metrics.proto#L69
This would be ideal, as we should not instantiate metrics for URNs without a
MonitoringInfoSpec.
It should be very similar to the API_REQUEST_COUNT, without STATUS label, a
different URN, and a histogram type
i.e. you can use
API_REQUEST_LATENCIES = 20 [(monitoring_info_spec) = {
urn: "beam:metric:io:api_request_latencies:v1",
type: "beam:metrics:histogram_int64:v1",
required_labels: [
"SERVICE",
"METHOD",
"RESOURCE",
"PTRANSFORM"
],
annotations: [
{
key: "description",
value: "Histogram counts for request latencies made to an IO’s
service APIs to batch read or write elements."
},
{
key: "units",
value: "Milliseconds"
},
{
key: "process_metric", // Should be reported as a process metric
// instead of a bundle metric
value: "true"
}
]
}];
##########
File path:
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/LatencyRecordingHttpRequestInitializer.java
##########
@@ -31,6 +31,7 @@
/** HttpRequestInitializer for recording request to response latency of
Http-based API calls. */
public class LatencyRecordingHttpRequestInitializer implements
HttpRequestInitializer {
+ // TODO: Import the URN from MonitoringInfoConstants.Urns when it's available
public static final String HISTOGRAM_URN = "
beam:metric:io:api_request_latencies:v1";
Review comment:
Remove space from start of string.
Please call this API_REQUEST_LATENCIES_URN latencies. Rather than
HISTOGRAM_URN
##########
File path:
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java
##########
@@ -411,20 +412,23 @@ public int hashCode() {
}
/**
- * Match a MetricName with a given namespace and a name. If the namespace or
the name is null, it
- * will be ignored for the match.
+ * Match a MetricName with a given metric filter. If the metric filter is
null, the method always
+ * returns true.
*/
private boolean matchMetricName(
- MetricName metricName, @Nullable String namespace, @Nullable String
name) {
- return (namespace == null || namespace.equals(metricName.getNamespace()))
- && (name == null || name.equals(metricName.getName()));
+ MetricName metricName, @Nullable Set<KV<String, String>> metricFilter) {
Review comment:
HashSet<MetricName> metricFilter
I think it would make more sense to make the metricFilter just be a Set of
MetricNames as well. MetricName has a hash code so you can do efficient set
lookups this way as well. It is used with Map types as a key normally, so this
should work.
Also, make the metric you are defining uses a MonitoringInfoMetricName,
which doesn't have a name+namespace concept, it instead has a urn and labels.
Instantiate a MonitoringInfoMetricName for it using the full URN and labels.
Don't parse the URN and put it in a MetricName as a namespace+name
https://github.com/apache/beam/blob/a72460272354747a54449358f5df414be4b6d72c/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java#L105
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -70,6 +73,15 @@
/** Tracks bytes written, exposed as "ByteCount" Counter. */
private Counter byteCounter = SinkMetrics.bytesWritten();
+ private Set<KV<String, String>> metricFilter =
Review comment:
I feel like KV isn't an appropriate type here. Isn't there a built in
Pair or Tuple we could use in Java instead?
As KV is designed for pcollections. I see a
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair being used
in the repo:
https://github.com/apache/beam/search?q=%22Pair%3C%22
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
##########
@@ -70,6 +73,15 @@
/** Tracks bytes written, exposed as "ByteCount" Counter. */
private Counter byteCounter = SinkMetrics.bytesWritten();
+ private Set<KV<String, String>> metricFilter =
+ ImmutableSet.of(
+ KV.of(
+ LatencyRecordingHttpRequestInitializer.HISTOGRAM_URN.split(":",
2)[0],
Review comment:
It looks like you are parsing the URN and treating it as if it were the
"namespace" and "name" of the metric.
However, this metric does not have a namespace+name, and should not be
modeled as such.
Remember, namespace+name is a concept that only applies to user defined
metrics. The namespace and name are simply labels populated with the keys
"namespace" and "name"
See the MonitoringInfoSpec on USER_SUM_INT64
https://github.com/apache/beam/blob/eca935bc36036dd11c5f2bd20052b08049375bfc/model/pipeline/src/main/proto/metrics.proto#L69
This metric should be identified using the urn, and it's labels.
Instantiate a MonitoringInfoMetricName for it instead, and update the
comparison logic to just filter using MetricNames (MonitoringInfoMetricName is
a subclass of MetricName).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]