This is an automated email from the ASF dual-hosted git repository.

yhu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 98cef8bb617 [GCP] [BigQuery] Handle `totalBytesProcessed` `NoneType` 
(#27474)
98cef8bb617 is described below

commit 98cef8bb6178c4bc18a0db5bae7eac1c1ea449d6
Author: Dan Hansen <[email protected]>
AuthorDate: Fri Jul 21 09:04:17 2023 -0700

    [GCP] [BigQuery] Handle `totalBytesProcessed` `NoneType` (#27474)
    
    * [GCP] [BigQuery] Handle totalBytesProcessed NoneType
    
    * Update CHANGES.md
    
    * lint / whitespace
    
    ---------
    
    Co-authored-by: Yi Hu <[email protected]>
---
 CHANGES.md                                 |  2 ++
 sdks/python/apache_beam/io/gcp/bigquery.py | 26 ++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index dff482b6b74..ec1c112ff4d 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -79,6 +79,8 @@
 
 ## Bugfixes
 
+* Fixed DirectRunner bug in Python SDK where GroupByKey gets empty PCollection 
and fails when pipeline option `direct_num_workers!=1`. 
([#27373](https://github.com/apache/beam/pull/27373))
+* Fixed BigQuery I/O bug when estimating size on queries that utilize 
row-level security ([#27474](https://github.com/apache/beam/pull/27474))
 * Fixed X (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
 
 ## Known Issues
diff --git a/sdks/python/apache_beam/io/gcp/bigquery.py 
b/sdks/python/apache_beam/io/gcp/bigquery.py
index 3fc7bfc3b02..5c1ca4a7d6e 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery.py
@@ -751,8 +751,17 @@ class _CustomBigQuerySource(BoundedSource):
           kms_key=self.kms_key,
           job_labels=self._get_bq_metadata().add_additional_bq_job_labels(
               self.bigquery_job_labels))
-      size = int(job.statistics.totalBytesProcessed)
-      return size
+
+      if job.statistics.totalBytesProcessed is None:
+        # Some queries may not have access to `totalBytesProcessed` as a
+        # result of row-level security.
+        # > BigQuery hides sensitive statistics on all queries against
+        # > tables with row-level security.
+        # See cloud.google.com/bigquery/docs/managing-row-level-security
+        # and cloud.google.com/bigquery/docs/best-practices-row-level-security
+        return None
+
+      return int(job.statistics.totalBytesProcessed)
     else:
       # Size estimation is best effort. We return None as we have
       # no access to the query that we're running.
@@ -1104,8 +1113,17 @@ class _CustomBigQueryStorageSource(BoundedSource):
           kms_key=self.kms_key,
           job_labels=self._get_bq_metadata().add_additional_bq_job_labels(
               self.bigquery_job_labels))
-      size = int(job.statistics.totalBytesProcessed)
-      return size
+
+      if job.statistics.totalBytesProcessed is None:
+        # Some queries may not have access to `totalBytesProcessed` as a
+        # result of row-level security
+        # > BigQuery hides sensitive statistics on all queries against
+        # > tables with row-level security.
+        # See cloud.google.com/bigquery/docs/managing-row-level-security
+        # and cloud.google.com/bigquery/docs/best-practices-row-level-security
+        return None
+
+      return int(job.statistics.totalBytesProcessed)
     else:
       # Size estimation is best effort. We return None as we have
       # no access to the query that we're running.

Reply via email to