gecko655 created AIRFLOW-6050:
---------------------------------

             Summary: Missing an argument `null_marker` in 
GoogleCloudStorageToBigQueryOperator
                 Key: AIRFLOW-6050
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6050
             Project: Apache Airflow
          Issue Type: Bug
          Components: hooks, operators
    Affects Versions: 1.10.3
            Reporter: gecko655


h1. Summary

We need the `null_marker` argument in GoogleCloudStorageToBigQueryOperator.
The spec of his argument is documented here: 
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job?hl=ja#jobconfigurationload
The related implementation is here:
https://github.com/apache/airflow/blob/09ccf296fc0595be8a0bb5802eb2df5d2948889b/airflow/operators/gcs_to_bq.py#L33

h1. Situation and reproduce

I could not load a CSV file to BigQuery table because the file contains 
`'null'` column in `timestamp` type table schema.
We can avoid this by specifying `null_marker` option.

Suppose we have a CSV file like:
{code:c}
start_time,end_time
'2019-11-23 16:49:00',null
{code}
and a schema definition like:
{code:json}
  {
    "mode": "NULLABLE", 
    "name": "start_time", 
    "type": "TIMESTAMP"
  }, 
  {
    "mode": "NULLABLE", 
    "name": "end_time", 
    "type": "TIMESTAMP"
  }
}
{code}

By running GoogleCloudStorageToBigQueryOperator in this situation, we get an 
error like:

bq. Could not parse 'null' as a timestamp. Required format is YYYY-MM-DD 
HH:MM[:SS[.SSSSSS]]; Could not parse 'null' as datetime for field end_time

Without Airflow GoogleCloudStorageToBigQueryOperator, we can run this process 
manually with the option `--null_marker='null'` .

h1. Related issues

https://issues.apache.org/jira/browse/AIRFLOW-5224



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to