Sergei Lilichenko created BEAM-13158:
----------------------------------------
Summary: Improve BigQueryIO Storage Write API data validation
error handling
Key: BEAM-13158
URL: https://issues.apache.org/jira/browse/BEAM-13158
Project: Beam
Issue Type: New Feature
Components: io-java-gcp
Affects Versions: 2.33.0
Reporter: Sergei Lilichenko
A single invalid row causes the BigQueryIO transform and the whole pipeline to
fail. The desired behavior would be to allow control of the error handling -
either fail on any validation failure (current behavior) or return the list of
failed records through the WriteResult.
There are two places where the exception occurs - Json to protobuf conversion
and the BigQuery backend.
Example of the exception caused by the conversion:
io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The proto field mismatched
with BigQuery field at D586b3f9a_1543_4dbe_87ff_ef786d6803c2.bytes_sent, the
proto field type string, BigQuery field type INTEGER Entity:
projects/event-processing-demo/datasets/bigquery_io/tables/events/streams/Cic2MzUyMTYxYy0wMDAwLTI2MjktOGVjYy1mNDAzMDQ1ZWY5Y2U6czI
Example of the exception caused by the BigQuery backend:
io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Field dst_ip: STRING(15) has
maximum length 15 but got a value with length 54 Entity:
projects/event-processing-demo/datasets/bigquery_io/tables/events/streams/CiQ2MzRkOGM5Mi0wMDAwLTI2MjktOGVjYy1mNDAzMDQ1ZWY5Y2U
--
This message was sent by Atlassian Jira
(v8.3.4#803005)