tomaslink opened a new issue, #38017:
URL: https://github.com/apache/beam/issues/38017

   ### What happened?
   
   I have a Python Dataflow streaming pipeline which reads from PubSub, does 
some processing, and writes to BigQuery. Using `STORAGE_WRITE_API` works fine 
but I'm trying to use `FILE_LOADS` to reduce costs, and I get the following 
exception:
   
   ```bash
   Error message from worker: generic::unknown: Traceback (most recent call 
last):
     File "apache_beam/runners/common.py", line 1498, in 
apache_beam.runners.common.DoFnRunner.process
     File "apache_beam/runners/common.py", line 912, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
     File "apache_beam/runners/common.py", line 1057, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
     File 
"/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py",
 line 528, in process
       self.process_one(element, job_name_prefix)
     File 
"/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py",
 line 575, in process_one
       self.bq_wrapper.wait_for_bq_job(job_reference, sleep_duration_sec=10)
     File 
"/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_tools.py", 
line 690, in wait_for_bq_job
       raise RuntimeError(
   RuntimeError: BigQuery job 
beam_bq_job_COPY_pipenmeafileloads_COPY_STEP_e2c58135db064a1e869f3633a7a1b037_0b8da1df1d54f6591407114d91108002
 failed. Error Result: <ErrorProto
    message: 'Failed to copy Non partitioned table to Column partitioned table: 
not supported.'
    reason: 'invalid'>
   ```
   
   I've done this successfully in the past, and I think it could be related to 
high volume of incoming messages. 
   I'm using DAY partitioning in the output table, and a `triggering_frequency` 
of 5 minutes. Does anyone know why this could be happening? Is there a set of 
parameters that could help resolve this issue?
   
   Someone reported something similar here:
   
https://stackoverflow.com/questions/68556242/pub-sub-to-bigquery-batch-using-dataflow-python
   
   Thanks.
   
   
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to