[
https://issues.apache.org/jira/browse/AIRFLOW-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hongyi Wang updated AIRFLOW-7118:
---------------------------------
Description:
After AIRFLOW-7117 is done, we re-gain the ability to override (BigQuery)
schema with customize
[self.schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65](if
set). However, we still pre-process sql value based on default schema rather
than [override
schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L112-L113].
For example, [mysql_to_gcs default to map (mysql)`FIELD_TYPE.DATE` to
(bigquery)`TIMESTAMP`|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L49].
Suppose we want to load such field to BigQuery as `DATE`, we would provide
[our own
schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65].
Unfortunately, since
[convert_type()|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L102]
still pre-process sql value based on default schema, (python)`date` will be
converted
timestamp|[https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L118-L119]].
As a result, BigQuery API returns error:
{code:java}
Could not convert non-string JSON value to DATE type{code}
To fix that, we need to update
[convert_type()|https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L103]
to always honor `schema_type` for data pre-preprocess. e.g.
[DATE|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type]
{code:java}
if schema_type == "DATE" and isinstance(value, date):
# In format of 'YYYY-[M]M-[D]D'
return value.isoformat(){code}
was:
After AIRFLOW-7117 is done, we re-gain the ability to override (BigQuery)
schema with customize
[self.schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65](if
set). However, we still pre-process sql value based on default schema rather
than [override
schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L112-L113].
For example, [mysql_to_gcs default to map (mysql)`FIELD_TYPE.DATE` to
(bigquery)`TIMESTAMP`|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L49].
Suppose we want to load such field to BigQuery as `DATE`, we would provide
[our own
schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65].
Unfortunately, since
[convert_type()|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L102]
still pre-process sql value based on default schema, (python)`date` will be
converted
timestamp|[https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L118-L119]].
As a result, BigQuery API returns error:
{code:java}
Could not convert non-string JSON value to DATE type{code}
To fix that, we need to update
[convert_type()|https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L103]
to always honor `schema_type` for data pre-preprocess. e.g.
{code:java}
if schema_type == "DATE" and isinstance(value, date):
return value.isoformat(){code}
> Honor schema type for mysql to gcs data pre-process
> ---------------------------------------------------
>
> Key: AIRFLOW-7118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7118
> Project: Apache Airflow
> Issue Type: Improvement
> Components: operators
> Affects Versions: 1.10.5, 1.10.6, 1.10.7, 1.10.8, 1.10.9
> Reporter: Hongyi Wang
> Assignee: Hongyi Wang
> Priority: Minor
>
> After AIRFLOW-7117 is done, we re-gain the ability to override (BigQuery)
> schema with customize
> [self.schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65](if
> set). However, we still pre-process sql value based on default schema rather
> than [override
> schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L112-L113].
>
> For example, [mysql_to_gcs default to map (mysql)`FIELD_TYPE.DATE` to
> (bigquery)`TIMESTAMP`|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L49].
> Suppose we want to load such field to BigQuery as `DATE`, we would provide
> [our own
> schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65].
> Unfortunately, since
> [convert_type()|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L102]
> still pre-process sql value based on default schema, (python)`date` will be
> converted
> timestamp|[https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L118-L119]].
> As a result, BigQuery API returns error:
> {code:java}
> Could not convert non-string JSON value to DATE type{code}
> To fix that, we need to update
> [convert_type()|https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L103]
> to always honor `schema_type` for data pre-preprocess. e.g.
> [DATE|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type]
> {code:java}
> if schema_type == "DATE" and isinstance(value, date):
> # In format of 'YYYY-[M]M-[D]D'
> return value.isoformat(){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)