gecko655 created AIRFLOW-6050:
---------------------------------
Summary: Missing an argument `null_marker` in
GoogleCloudStorageToBigQueryOperator
Key: AIRFLOW-6050
URL: https://issues.apache.org/jira/browse/AIRFLOW-6050
Project: Apache Airflow
Issue Type: Bug
Components: hooks, operators
Affects Versions: 1.10.3
Reporter: gecko655
h1. Summary
We need the `null_marker` argument in GoogleCloudStorageToBigQueryOperator.
The spec of his argument is documented here:
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job?hl=ja#jobconfigurationload
The related implementation is here:
https://github.com/apache/airflow/blob/09ccf296fc0595be8a0bb5802eb2df5d2948889b/airflow/operators/gcs_to_bq.py#L33
h1. Situation and reproduce
I could not load a CSV file to BigQuery table because the file contains
`'null'` column in `timestamp` type table schema.
We can avoid this by specifying `null_marker` option.
Suppose we have a CSV file like:
{code:c}
start_time,end_time
'2019-11-23 16:49:00',null
{code}
and a schema definition like:
{code:json}
{
"mode": "NULLABLE",
"name": "start_time",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "end_time",
"type": "TIMESTAMP"
}
}
{code}
By running GoogleCloudStorageToBigQueryOperator in this situation, we get an
error like:
bq. Could not parse 'null' as a timestamp. Required format is YYYY-MM-DD
HH:MM[:SS[.SSSSSS]]; Could not parse 'null' as datetime for field end_time
Without Airflow GoogleCloudStorageToBigQueryOperator, we can run this process
manually with the option `--null_marker='null'` .
h1. Related issues
https://issues.apache.org/jira/browse/AIRFLOW-5224
--
This message was sent by Atlassian Jira
(v8.3.4#803005)