[
https://issues.apache.org/jira/browse/BEAM-11359?focusedWorklogId=609992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609992
]
ASF GitHub Bot logged work on BEAM-11359:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Jun/21 23:24
Start Date: 10/Jun/21 23:24
Worklog Time Spent: 10m
Work Description: pabloem commented on a change in pull request #14745:
URL: https://github.com/apache/beam/pull/14745#discussion_r649593016
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_read_internal.py
##########
@@ -183,15 +183,16 @@ def process(self,
element: 'ReadFromBigQueryRequest') -> Iterable[BoundedSource]:
bq = bigquery_tools.BigQueryWrapper(
temp_dataset_id=self._get_temp_dataset().datasetId)
- # TODO(BEAM-11359): Clean up temp dataset at pipeline completion.
if element.query is not None:
self._setup_temporary_dataset(bq, element)
table_reference = self._execute_query(bq, element)
+ created_temp_dataset = True
Review comment:
I think this is not enough to be sure of whether we created the dataset.
You may need to change `_setup_temporary_dataset`, and this:
https://github.com/apache/beam/blob/2aed67b1fbacce923e22347400251c34a1f6ab2c/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L788-L814
to return something to the caller depending on whether the dataset was
created or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 609992)
Time Spent: 4h 50m (was: 4h 40m)
> Clean up temporary dataset after ReadAllFromBQ executes
> -------------------------------------------------------
>
> Key: BEAM-11359
> URL: https://issues.apache.org/jira/browse/BEAM-11359
> Project: Beam
> Issue Type: Improvement
> Components: io-py-gcp
> Reporter: Pablo Estrada
> Priority: P3
> Time Spent: 4h 50m
> Remaining Estimate: 0h
>
> Currently, the transform creates (or receives) a temp dataset and it does not
> clean it up. Only one is created per pipeline, so it's not too bad, but it's
> not ideal.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)