pabloem commented on a change in pull request #12960:
URL: https://github.com/apache/beam/pull/12960#discussion_r514516617



##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -731,16 +744,19 @@ def create_temporary_dataset(self, project_id, location):
   @retry.with_exponential_backoff(
       num_retries=MAX_RETRIES,
       retry_filter=retry.retry_on_server_errors_and_timeout_filter)
-  def clean_up_temporary_dataset(self, project_id):
+  def clean_up_temporary_dataset(self, project_id, dataset_reference=None):
+    if dataset_reference:
+      project_id = dataset_reference.projectId
     temp_table = self._get_temp_table(project_id)
+    dataset_id = dataset_reference.datasetId if dataset_reference \

Review comment:
       in beam, we usually break lines using parentheses, kind of like this:
   ```
   dataset_id = (
       dataset_reference.datasetId if dataset_reference else 
temp_table.datasetId)
   ```

##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -712,6 +713,7 @@ def __init__(
     self.bq_io_metadata = None  # Populate in setup, as it may make an RPC
     self.bigquery_job_labels = bigquery_job_labels or {}
     self.use_json_exports = use_json_exports
+    self.temp_dataset = temp_dataset

Review comment:
       how about we add a `get_temporary_dataset` function to 
`BigqueryWrapper`, and we can define a `self.temp_dataset = self.temp_dataset 
or bq.get_temporary_dataset()` (this logic would need to run in the `split`, 
and `- and once we've done that, we can just pass the dataset name around to 
every call? That way we will treat the user-defined dataset and the automatic 
dataset the same way?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to