rizenfrmtheashes opened a new issue, #23670:
URL: https://github.com/apache/beam/issues/23670

   ### What happened?
   
   We believe that in the Python SDK, for the GCP IO library, In 
`WriteToBigQuery`, using `file_loads` and dynamic table destinations, that 
sometimes in rare scenarios temp tables get removed via the `RemoveTempTables` 
ptransform before the prior Copy Jobs finish or even get kicked off. 
   
   We've only seen this occur under heavy load (many millions of rows) and high 
parallelism (beam 2.40, dataflow v2 runner, autoscale from 1 to ~40 
n1-standard-4 instances).  
   
   We encounter stack traces like
   ```
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 284, in _execute
       response = task()
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 357, in <lambda>
       lambda: self.create_worker().do_instruction(request), request)
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 598, in do_instruction
       getattr(request, request_type), request.instruction_id)
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 635, in process_bundle
       bundle_processor.process_bundle(instruction_id))
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1004, in process_bundle
       element.data)
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 227, in process_encoded
       self.output(decoded_value)
     File "apache_beam/runners/worker/operations.py", line 526, in 
apache_beam.runners.worker.operations.Operation.output
     File "apache_beam/runners/worker/operations.py", line 528, in 
apache_beam.runners.worker.operations.Operation.output
     File "apache_beam/runners/worker/operations.py", line 237, in 
apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
     File "apache_beam/runners/worker/operations.py", line 240, in 
apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
     File "apache_beam/runners/worker/operations.py", line 907, in 
apache_beam.runners.worker.operations.DoOperation.process
     File "apache_beam/runners/worker/operations.py", line 908, in 
apache_beam.runners.worker.operations.DoOperation.process
     File "apache_beam/runners/common.py", line 1419, in 
apache_beam.runners.common.DoFnRunner.process
     File "apache_beam/runners/common.py", line 1507, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
     File "apache_beam/runners/common.py", line 1417, in 
apache_beam.runners.common.DoFnRunner.process
     File "apache_beam/runners/common.py", line 837, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
     File "apache_beam/runners/common.py", line 983, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
     File 
"/usr/local/lib/python3.7/site-packages/steps/bigquery_file_loads_patch_40.py", 
line 610, in process
       self.bq_wrapper.wait_for_bq_job(job_reference, sleep_duration_sec=10)
     File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", 
line 635, in wait_for_bq_job
       job_reference.jobId, job.status.errorResult))
   RuntimeError: BigQuery job 
beam_bq_job_COPY_redactednameofjobhere31945c91d69d91e3019c4b1789457ff0ff7c289120221014000701099659_COPY_STEP_311_98bf2821aecda3f10bbc5b6cf26716f0
 failed. Error Result: <ErrorProto
    message: 'Not found: Table 
REDACTED_GCP_PROJECT_NAME:REDACTED_BQ_DATASET.beam_bq_job_LOAD_prodmetricsstreaming31945c91d69d91e3019c4b1789457ff0ff7c289120221014000701099659_LOAD_STEP_838_5318a60f33ee48e464b734da26b8c43f_01f82b6914fb40bdb98be33b7dd446f4'
    reason: 'notFound'> [while running 'Write [REDACTED] to BigQuery File 
Loads/BigQueryBatchFileLoads/ParDo(TriggerCopyJobs)/ParDo(TriggerCopyJobs)-ptransform-24']
   ```
   
   as a note `bigquery_file_loads_patch_40.py` is just a reference to a 
copy/pasted version of the source `bigquery_file_loads.py` file in the gcp/io 
section of the SDK that we used to backport fixes from newer versions of beam 
(like #23012). We did dependency checking to make sure the backported fixes 
were okay.
   
   You can likely reproduce by using a code pipeline like [this doc 
here](https://docs.google.com/document/d/1uIM5JVq0dAh2uDB0HfzQN7PS60U8TsJnalvAvfkfcnM/edit?usp=sharing)
 (used for a prior bug report), and instead send millions of rows through
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-py-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to