Re: [PR] Filter empty distribution metrics [beam]

via GitHub Mon, 12 Aug 2024 15:27:11 -0700


tvalentyn commented on code in PR #32027:
URL: https://github.com/apache/beam/pull/32027#discussion_r1714416927



##########
sdks/python/apache_beam/metrics/monitoring_infos.py:
##########
@@ -213,11 +216,18 @@ def int64_user_distribution(namespace, name, metric, 
ptransform=None):
     metric: The DistributionData for the metric.
     ptransform: The ptransform id used as a label.
   """
-  labels = create_labels(ptransform=ptransform, namespace=namespace, name=name)
-  payload = _encode_distribution(
-      coders.VarIntCoder(), metric.count, metric.sum, metric.min, metric.max)
-  return create_monitoring_info(
-      USER_DISTRIBUTION_URN, DISTRIBUTION_INT64_TYPE, payload, labels)
+  if metric.count <= 0:

Review Comment:
   nit: I'd prefer to lead the condition with the happy path (`if metric.count 
> 0`)



##########
sdks/python/apache_beam/runners/worker/operations.py:
##########
@@ -605,18 +605,23 @@ def pcollection_count_monitoring_infos(self, 
tag_to_pcollection_id):
         receiver.opcounter.element_counter.value(),
         pcollection=pcollection_id,
     )
+    all_monitoring_infos[monitoring_infos.to_key(elem_count_mi)] = 
elem_count_mi

Review Comment:
   Do we need to check that `elem_count_mi` is not `None` here?



##########
sdks/python/apache_beam/metrics/monitoring_infos.py:
##########
@@ -213,11 +216,18 @@ def int64_user_distribution(namespace, name, metric, 
ptransform=None):
     metric: The DistributionData for the metric.
     ptransform: The ptransform id used as a label.
   """
-  labels = create_labels(ptransform=ptransform, namespace=namespace, name=name)
-  payload = _encode_distribution(
-      coders.VarIntCoder(), metric.count, metric.sum, metric.min, metric.max)
-  return create_monitoring_info(
-      USER_DISTRIBUTION_URN, DISTRIBUTION_INT64_TYPE, payload, labels)
+  if metric.count <= 0:
+    _LOGGER.debug(
+        'Expected a non zero distribution count for %s metric but received %s' 
%
+        (metric.name, metric.count))
+    return

Review Comment:
   Since we return `None` (implicitly), we need to change the method typehint 
to `# type: (...) -> Optional[metrics_pb2.MonitoringInfo]`. 



##########
sdks/python/apache_beam/metrics/monitoring_infos.py:
##########
@@ -214,6 +214,11 @@ def int64_user_distribution(namespace, name, metric, 
ptransform=None):
     ptransform: The ptransform id used as a label.
   """
   labels = create_labels(ptransform=ptransform, namespace=namespace, name=name)
+  if metric.count <= 0:
+    raise TypeError(

Review Comment:
   I think not emitting sounds fine, left some comments.



##########
sdks/python/apache_beam/runners/worker/operations.py:
##########
@@ -605,18 +605,23 @@ def pcollection_count_monitoring_infos(self, 
tag_to_pcollection_id):
         receiver.opcounter.element_counter.value(),
         pcollection=pcollection_id,
     )
+    all_monitoring_infos[monitoring_infos.to_key(elem_count_mi)] = 
elem_count_mi

Review Comment:
   Also, what happens with distribution if the count is 0 after all? For 
example , pipeline reads from a file, but there are no elements, and pipeline 
stops. will this case be handled correctly?



##########
sdks/python/apache_beam/metrics/monitoring_infos.py:
##########
@@ -213,11 +216,18 @@ def int64_user_distribution(namespace, name, metric, 
ptransform=None):
     metric: The DistributionData for the metric.
     ptransform: The ptransform id used as a label.
   """
-  labels = create_labels(ptransform=ptransform, namespace=namespace, name=name)
-  payload = _encode_distribution(
-      coders.VarIntCoder(), metric.count, metric.sum, metric.min, metric.max)
-  return create_monitoring_info(
-      USER_DISTRIBUTION_URN, DISTRIBUTION_INT64_TYPE, payload, labels)
+  if metric.count <= 0:
+    _LOGGER.debug(
+        'Expected a non zero distribution count for %s metric but received %s' 
%

Review Comment:
   In which situation this log message will be actionable? I wonder if we 
should remove this log if it commonly happens (e.g. retries). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Filter empty distribution metrics [beam]

Reply via email to