TobiasBredow opened a new issue, #32155: URL: https://github.com/apache/beam/issues/32155
### What happened? I noticed that there are some differences when switching over ingestions from the Streaming_Inserts to the Storage_Write_API in the WriteToBigQuery transform. Using the Python API. Namely in the old ingestion it is possible to pass in empty repeated fields and they will default to an empty list. However that fails as soon as the newer Storage_Write_API is used. It seems that since it converts the inputs to a beam row to send it to Java api it runs into an error in [beam_row_from_dict](https://github.com/apache/beam/blob/2f93d8bc19917f83d15f531bcbbfb7f36e21ff88/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1567). Since it converts fields that are not present to None but if that field is a repeated struct or recorde it then fails when trying to iterate over None in line [1601](https://github.com/apache/beam/blob/2f93d8bc19917f83d15f531bcbbfb7f36e21ff88/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1601) . Is that new wanted behavior as it forces us to always add an empty list to a dict before sending it to to the write to BigQuery transform? Especially if you have multiple of these fields in a high frequency source they add to the data_processed costs by Dataflow. I would also be happy to adjust this behavior myself since it looks like a small and easy fix to me. If it is not by design and wanted that the transforms fails in that way with empty repeated fields. ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [X] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Infrastructure - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
