damccorm opened a new issue, #20748:
URL: https://github.com/apache/beam/issues/20748

   I'm running a number of GCP DataFlow jobs to transform some tables within 
GCP BigQuery, and they're creating a bunch of temporary datasets that are not 
deleted when the job completes successfully. I'm running the GCP DataFlow jobs 
by using Airflow / GCP Cloud Composer.
   
   The Composer environment Airflow UI does not reveal anything. When I go into 
GCP DataFlow, click on a job named $BATCH_JOB marked with "Status: Succeeded" 
and "SDK version: 2.27.0", a step within that job and a stage within that step 
(?), and then open up the Logs window and filter for "LogLevel: Error" and 
click on a log message, I get this:
   
    
   
   ```bash
   
   Error message from worker: Traceback (most recent call last): File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
649, in do_work work_executor.execute() File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 226, 
in execute self._split_task) File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 234, 
in _perform_source_split_considering_api_limits desired_bundle_size) File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 271, 
in _perform_source_split for split in source.split(desired_bundle_size): File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
796, in split schema, metadata_list = self._export_files(bq) File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
881, in _export_files bq.wait_for_bq_job(job_ref) File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", 
line 525, in wait_for_b
 q_job job_reference.jobId, job.status.errorResult)) RuntimeError: BigQuery job 
beam_bq_job_EXPORT_latestrow060a408d75f23074efbacd477228b4b30bc_68cc517f-f_436 
failed. Error Result: <ErrorProto message: 'Not found: Table 
motorefi-analytics:temp_dataset_3a43c81c858e429f871d37802d7ac4f6.temp_table_3a43c81c858e429f871d37802d7ac4f6
 was not found in location US' reason: 'notFound'\>
   
   ```
   
    
   
   I would provide the equivalent REST for the batch job description but I'm 
not sure if it is helpful or sensitive information.
   
    
   
   I'm not sure whether Beam v2.27.0 is affected by 
https://issues.apache.org/jira/browse/BEAM-6514 or 
[https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609,](https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609,)
 but I am using the Python 3.7 SDK v2.27.0 and not the Java SDK.
   
    
   
   Appreciate any help for this issue.
   
   Imported from Jira 
[BEAM-11905](https://issues.apache.org/jira/browse/BEAM-11905). Original Jira 
may contain additional context.
   Reported by: yingw787.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to