gemini-code-assist[bot] commented on code in PR #38833:
URL: https://github.com/apache/beam/pull/38833#discussion_r3506241753
##########
sdks/python/apache_beam/io/gcp/bigquery_file_loads.py:
##########
@@ -735,19 +763,41 @@ def process(
create_disposition = self.create_disposition
if self.temporary_tables:
+ destination_table = None
+ hashed_dest = bigquery_tools.get_hashable_destination(table_reference)
+ need_schema = schema is None and hashed_dest not in self.schema_cache
+ need_partitioning = not _has_partitioning_load_parameters(
+ additional_parameters)
+ if need_schema or need_partitioning:
+ try:
+ if hashed_dest in self.destination_table_cache:
+ destination_table = self.destination_table_cache[hashed_dest]
+ else:
+ destination_table = self.bq_wrapper.get_table(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId)
+ self.destination_table_cache[hashed_dest] = destination_table
+ except Exception as e:
+ if need_schema:
+ _LOGGER.warning(
+ "Input schema is absent and could not fetch the final "
+ "destination table's schema [%s]. Creating temp table [%s] "
+ "will likely fail: %s",
+ hashed_dest,
+ job_name,
+ e)
+ destination_table = None
Review Comment:

If `get_table` raises an exception (e.g., if the destination table does not
exist yet, which is common when creating a new table), the exception is caught,
but `self.destination_table_cache` is not updated. As a result, subsequent
elements/partitions processed in the same bundle will repeatedly attempt to
call `get_table` and fail, leading to a significant performance bottleneck and
potential BigQuery API rate-limiting issues.
To prevent this, we should cache `None` in `self.destination_table_cache`
when an exception occurs.
```suggestion
except Exception as e:
if need_schema:
_LOGGER.warning(
"Input schema is absent and could not fetch the final "
"destination table's schema [%s]. Creating temp table [%s] "
"will likely fail: %s",
hashed_dest,
job_name,
e)
destination_table = None
self.destination_table_cache[hashed_dest] = None
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]