[ 
https://issues.apache.org/jira/browse/BEAM-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139956#comment-17139956
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-6514:
-----------------------------------------------------

Current workaround is to execute a secondary program to cleanup temporary 
resources generated by failed jobs. For example, the secondary program could 
use Dataflow API to monitor the status of jobs and cleanup resources from 
failed jobs.

> Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery
> ------------------------------------------------------------------------
>
>                 Key: BEAM-6514
>                 URL: https://issues.apache.org/jira/browse/BEAM-6514
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Rumeshkrishnan Mohan
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P2
>              Labels: stale-assigned
>
> Dataflow is leaving Datasets/Tables behind in BigQuery when the pipeline is 
> cancelled or when it fails. I cancelled a job or it failed at run time, and 
> it left behind a dataset and table in BigQuery.
>  # `cleanupTempResource` method involves cleaning tables and dataset after 
> batch job succeed.
>  # If job failed in the middle or cancelled explicitly, the temporary dataset 
> and tables remain exist. I do see the table expire period 1 day as per code 
> in `getTableToExtract` function written in BigQueryQuerySource.java.
>  # I can understand that, keep temp tables and dataset when failure for 
> debugging.
>  # Can we have pipeline or job optional parameters which get clean temporary 
> dataset and tables when cancel or fail ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to