ihji commented on a change in pull request #12754:
URL: https://github.com/apache/beam/pull/12754#discussion_r483286311



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1198,8 +1209,34 @@ def process(self, element, *schema_side_inputs):
       return self._flush_all_batches()
 
   def finish_bundle(self):
+    current_millis = int(time.time() * 1000)
+    if BigQueryWriteFn.LATENCY_LOGGING_LOCK.acquire(False):
+      try:
+        if (BigQueryWriteFn.LATENCY_LOGGING_HISTOGRAM.total_count() > 0 and
+            (current_millis -
+             BigQueryWriteFn.LATENCY_LOGGING_LAST_REPORTED_MILLIS) >
+            self._latency_logging_frequency * 1000):
+          self._log_percentiles()
+          BigQueryWriteFn.LATENCY_LOGGING_HISTOGRAM.clear()

Review comment:
       > Is there a reason why you clear? I think we need to have a good sample 
of elements over a period of time to a decent latency breakdown. This seems 
like it will be cleared quite frequently.
   
   The default interval is 3 minutes so it isn't cleared that frequently. And 
also the interval is configurable so we could increase it by specifying a 
pipeline option.
   
   > In the logging case, we would like to have something similar. I'm not 
saying you need to make such a complex calculation to move things in and out of 
the window.
   
   Yes, it would have been great if we used sliding window for histogram 
recording. We could improve the histogram object with time awareness in the 
future.
   
   > Though, maybe this is perfectly fine, for 180 seconds we can probably 
record a decent number of requests.
   
   😄 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to