rizenfrmtheashes opened a new issue, #23670:
URL: https://github.com/apache/beam/issues/23670
### What happened?
We believe that in the Python SDK, for the GCP IO library, In
`WriteToBigQuery`, using `file_loads` and dynamic table destinations, that
sometimes in rare scenarios temp tables get removed via the `RemoveTempTables`
ptransform before the prior Copy Jobs finish or even get kicked off.
We've only seen this occur under heavy load (many millions of rows) and high
parallelism (beam 2.40, dataflow v2 runner, autoscale from 1 to ~40
n1-standard-4 instances).
We encounter stack traces like
```
Traceback (most recent call last):
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 284, in _execute
response = task()
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 598, in do_instruction
getattr(request, request_type), request.instruction_id)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
line 1004, in process_bundle
element.data)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
line 227, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 526, in
apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 528, in
apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 237, in
apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 240, in
apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in
apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in
apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in
apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1507, in
apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in
apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 837, in
apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 983, in
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File
"/usr/local/lib/python3.7/site-packages/steps/bigquery_file_loads_patch_40.py",
line 610, in process
self.bq_wrapper.wait_for_bq_job(job_reference, sleep_duration_sec=10)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 635, in wait_for_bq_job
job_reference.jobId, job.status.errorResult))
RuntimeError: BigQuery job
beam_bq_job_COPY_redactednameofjobhere31945c91d69d91e3019c4b1789457ff0ff7c289120221014000701099659_COPY_STEP_311_98bf2821aecda3f10bbc5b6cf26716f0
failed. Error Result: <ErrorProto
message: 'Not found: Table
REDACTED_GCP_PROJECT_NAME:REDACTED_BQ_DATASET.beam_bq_job_LOAD_prodmetricsstreaming31945c91d69d91e3019c4b1789457ff0ff7c289120221014000701099659_LOAD_STEP_838_5318a60f33ee48e464b734da26b8c43f_01f82b6914fb40bdb98be33b7dd446f4'
reason: 'notFound'> [while running 'Write [REDACTED] to BigQuery File
Loads/BigQueryBatchFileLoads/ParDo(TriggerCopyJobs)/ParDo(TriggerCopyJobs)-ptransform-24']
```
as a note `bigquery_file_loads_patch_40.py` is just a reference to a
copy/pasted version of the source `bigquery_file_loads.py` file in the gcp/io
section of the SDK that we used to backport fixes from newer versions of beam
(like #23012). We did dependency checking to make sure the backported fixes
were okay.
You can likely reproduce by using a code pipeline like [this doc
here](https://docs.google.com/document/d/1uIM5JVq0dAh2uDB0HfzQN7PS60U8TsJnalvAvfkfcnM/edit?usp=sharing)
(used for a prior bug report), and instead send millions of rows through
### Issue Priority
Priority: 2
### Issue Component
Component: io-py-gcp
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]