tomaslink opened a new issue, #38017:
URL: https://github.com/apache/beam/issues/38017
### What happened?
I have a Python Dataflow streaming pipeline which reads from PubSub, does
some processing, and writes to BigQuery. Using `STORAGE_WRITE_API` works fine
but I'm trying to use `FILE_LOADS` to reduce costs, and I get the following
exception:
```bash
Error message from worker: generic::unknown: Traceback (most recent call
last):
File "apache_beam/runners/common.py", line 1498, in
apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 912, in
apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 1057, in
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File
"/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py",
line 528, in process
self.process_one(element, job_name_prefix)
File
"/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py",
line 575, in process_one
self.bq_wrapper.wait_for_bq_job(job_reference, sleep_duration_sec=10)
File
"/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 690, in wait_for_bq_job
raise RuntimeError(
RuntimeError: BigQuery job
beam_bq_job_COPY_pipenmeafileloads_COPY_STEP_e2c58135db064a1e869f3633a7a1b037_0b8da1df1d54f6591407114d91108002
failed. Error Result: <ErrorProto
message: 'Failed to copy Non partitioned table to Column partitioned table:
not supported.'
reason: 'invalid'>
```
I've done this successfully in the past, and I think it could be related to
high volume of incoming messages.
I'm using DAY partitioning in the output table, and a `triggering_frequency`
of 5 minutes. Does anyone know why this could be happening? Is there a set of
parameters that could help resolve this issue?
Someone reported something similar here:
https://stackoverflow.com/questions/68556242/pub-sub-to-bigquery-batch-using-dataflow-python
Thanks.
### Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
### Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]