[GitHub] [beam] ihji commented on a change in pull request #12754: [BEAM-10791] Identify and log additional information needed to debug …

GitBox Thu, 03 Sep 2020 15:19:16 -0700


ihji commented on a change in pull request #12754:
URL: https://github.com/apache/beam/pull/12754#discussion_r483283672




##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1198,8 +1209,34 @@ def process(self, element, *schema_side_inputs):
       return self._flush_all_batches()
 
   def finish_bundle(self):
+    current_millis = int(time.time() * 1000)
+    if BigQueryWriteFn.LATENCY_LOGGING_LOCK.acquire(False):
+      try:
+        if (BigQueryWriteFn.LATENCY_LOGGING_HISTOGRAM.total_count() > 0 and
+            (current_millis -
+             BigQueryWriteFn.LATENCY_LOGGING_LAST_REPORTED_MILLIS) >
+            self._latency_logging_frequency * 1000):
+          self._log_percentiles()
+          BigQueryWriteFn.LATENCY_LOGGING_HISTOGRAM.clear()
+          BigQueryWriteFn.LATENCY_LOGGING_LAST_REPORTED_MILLIS = current_millis
+      finally:
+        BigQueryWriteFn.LATENCY_LOGGING_LOCK.release()
     return self._flush_all_batches()
 
+  @classmethod
+  def _log_percentiles(cls):
+    # Note that the total count and each percentile value may not be correlated
+    # each other. Histogram releases lock between each percentile calculation
+    # so additional latencies could be added anytime.
+    # pylint: disable=round-builtin
+    _LOGGER.info(

Review comment:
       Each of the methods has the same lock so the value doesn't change during 
the single call like `p50()`. So `thread1: p50() -> thread 2: record() -> 
thread 1: p90()` is possible but `thread1: p50() + thread2: record()` running 
at the same time is impossible.
   
   I didn't create and lock the method with multiple percentile calculation 
because of performance concerns. Also I thought this logging is for 
informational purpose so small misalignment might be allowable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] ihji commented on a change in pull request #12754: [BEAM-10791] Identify and log additional information needed to debug …

Reply via email to