pabloem commented on a change in pull request #12960:
URL: https://github.com/apache/beam/pull/12960#discussion_r514516617
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -731,16 +744,19 @@ def create_temporary_dataset(self, project_id, location):
@retry.with_exponential_backoff(
num_retries=MAX_RETRIES,
retry_filter=retry.retry_on_server_errors_and_timeout_filter)
- def clean_up_temporary_dataset(self, project_id):
+ def clean_up_temporary_dataset(self, project_id, dataset_reference=None):
+ if dataset_reference:
+ project_id = dataset_reference.projectId
temp_table = self._get_temp_table(project_id)
+ dataset_id = dataset_reference.datasetId if dataset_reference \
Review comment:
in beam, we usually break lines using parentheses, kind of like this:
```
dataset_id = (
dataset_reference.datasetId if dataset_reference else
temp_table.datasetId)
```
##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -712,6 +713,7 @@ def __init__(
self.bq_io_metadata = None # Populate in setup, as it may make an RPC
self.bigquery_job_labels = bigquery_job_labels or {}
self.use_json_exports = use_json_exports
+ self.temp_dataset = temp_dataset
Review comment:
how about we add a `get_temporary_dataset` function to
`BigqueryWrapper`, and we can define a `self.temp_dataset = self.temp_dataset
or bq.get_temporary_dataset()` (this logic would need to run in the `split`,
and `- and once we've done that, we can just pass the dataset name around to
every call? That way we will treat the user-defined dataset and the automatic
dataset the same way?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]