[I] [Bug]: Writing to a BigQuery partition requires clustering info to succeed if destination table has clustered fields [beam]

via GitHub Thu, 10 Apr 2025 13:31:24 -0700


pricealexandra opened a new issue, #34571:
URL: https://github.com/apache/beam/issues/34571


   ### What happened?
   
   I'm not sure if this is actually a bug or expected behavior, but I noticed 
that I need to add clustering fields explicitly when trying to write to a 
partition on a table that has clustered fields, like so:
   ```
   def decorate_bq_table_with_daily_partition(table_ref: str, element: Any) -> 
str:
       timestamp = element["timestamp"]
       datetime = pendulum.from_timestamp(timestamp, tz="UTC")
       partitioned_ref = f"{table_ref}${datetime.format("YYYYMMDD")}"
       return partitioned_ref
   
   ...other transform code here...
   
   beam.io.WriteToBigQuery(
       table=functools.partial(
            decorate_bq_table_with_daily_partition, table
      ),
       write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
       additional_bq_parameters=(
            {
                  "clustering": {"fields": clustered_fields},
             }
       ),
   )
   ```
   I get a "incompatible clustering fields" error if I don't include the 
`additional_bq_parameters` block. It seems to be because temporary tables are 
created as part of `WriteToBigQuery` without any clustered fields and then 
cannot be copied to the existing destination table which does have clustered 
fields.
   
   Is it expected that we need to define the clustering fields explicitly, or 
should the temporary tables get the clustered fields of the table we want to 
write to if it exists? If it's expected, it might be helpful to add a line to 
the docs.
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Bug]: Writing to a BigQuery partition requires clustering info to succeed if destination table has clustered fields [beam]

Reply via email to