Gene Peters created BEAM-4835:
---------------------------------
Summary: Add more flexible options for data loading to
BigQueryIO.Write
Key: BEAM-4835
URL: https://issues.apache.org/jira/browse/BEAM-4835
Project: Beam
Issue Type: Improvement
Components: io-java-gcp
Reporter: Gene Peters
Assignee: Chamikara Jayalath
As part of the BigQuery API, there are a few options exposed to end-users which
allow for more flexible data loading.
For both
[streaming|https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableDataInsertAllRequest.html#setIgnoreUnknownValues-java.lang.Boolean-]
and
[batch|https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setIgnoreUnknownValues-java.lang.Boolean-]
inserts, the flag "ignoreUnknownValues" can be set, which indicates if
BigQuery should accept rows that contain values that do not match the schema.
[In
addition,|https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableDataInsertAllRequest.html#setSkipInvalidRows-java.lang.Boolean-]
streaming inserts allow for the option of accepting an inserted batch of rows
even if some of of the rows are invalid.
I've made the necessary code changes to make this available within
BigQueryIO.Write and will be attaching the pull request to this ticket for
review. Both flags are off by default.
Let me know if you have any questions or feedback about this!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)