ihji commented on a change in pull request #12754:
URL: https://github.com/apache/beam/pull/12754#discussion_r483282523



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1198,8 +1209,34 @@ def process(self, element, *schema_side_inputs):
       return self._flush_all_batches()
 
   def finish_bundle(self):
+    current_millis = int(time.time() * 1000)
+    if BigQueryWriteFn.LATENCY_LOGGING_LOCK.acquire(False):
+      try:
+        if (BigQueryWriteFn.LATENCY_LOGGING_HISTOGRAM.total_count() > 0 and
+            (current_millis -
+             BigQueryWriteFn.LATENCY_LOGGING_LAST_REPORTED_MILLIS) >
+            self._latency_logging_frequency * 1000):
+          self._log_percentiles()
+          BigQueryWriteFn.LATENCY_LOGGING_HISTOGRAM.clear()
+          BigQueryWriteFn.LATENCY_LOGGING_LAST_REPORTED_MILLIS = current_millis
+      finally:
+        BigQueryWriteFn.LATENCY_LOGGING_LOCK.release()
     return self._flush_all_batches()
 
+  @classmethod
+  def _log_percentiles(cls):
+    # Note that the total count and each percentile value may not be correlated

Review comment:
       Which means, for example, that it's possible that between `p90()` and 
`p50()` calls `record` might be called as well from different threads so that 
`p90()` calculates the percentile for 100 recorded values but following `p50()` 
calculates the percentile for 101 recorded values. It's because we lock each 
`p90()`,`p50()` and `record` method, not `p90() + p50()`. Hope this explanation 
is clear 😅 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to