Sayat Satybaldiyev created BEAM-12669:
-----------------------------------------
Summary: UpdateDestinationSchema PTransform does not respect
source format
Key: BEAM-12669
URL: https://issues.apache.org/jira/browse/BEAM-12669
Project: Beam
Issue Type: Bug
Components: io-go-gcp, runner-dataflow
Affects Versions: 2.30.0
Reporter: Sayat Satybaldiyev
When multiple load jobs are needed to write data to a destination table, e.g.,
when the data is spread over more than 10,000 URIs, WriteToBigQuery in
FILE_LOADS mode will write data into temporary tables and then update the
temporary tables if schema additions is allowed.
However, update of temporary table scheme does not respect a specified source
format of the loading files(i.e. JSON, AVRO).
The UpdateDestinationSchema issue schema modification command with a default
CSV setting which causing AVRO or JSON nested schema loads to fail with the
error:
{code:java}
apache_beam.io.gcp.bigquery_file_loads: INFO: Triggering schema modification
job
beam_bq_job_LOAD_satybald7_SCHEMA_MOD_STEP_994_3869e4dc1dd08c68d20fd047e242161a_7c553f684cce4963a75d669f38a4ec46
on <TableReference
datasetId: 'python_write_to_table_1627431111435'
projectId: 'DELETED'
tableId: 'python_append_schema_update'>
apache_beam.io.gcp.bigquery_tools: INFO: Failed to insert job <JobReference
jobId:
'beam_bq_job_LOAD7_SCHEMA_MOD_STEP_994_3869e4dc1dd08c68d20fd047e242161a_7c553f684cce4963a75d669f38a4ec46'
projectId: 'DELETED'>: HttpError accessing ....
'content-type': 'application/json; charset=UTF-8', 'content-length': '332',
'date': 'Wed, 28 Jul 2021 00:12:03 GMT', 'server': 'UploadServer', 'status':
'400'}>, content <{
"error": {
"code": 400,
"message": "Cannot load CSV data with a nested schema. Field: nested_field",
"errors": [
{
"message": "Cannot load CSV data with a nested schema. Field:
nested_field",
"domain": "global",
"reason": "invalid"
}
],
"status": "INVALID_ARGUMENT"
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)