[
https://issues.apache.org/jira/browse/AIRFLOW-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080391#comment-17080391
]
ASF GitHub Bot commented on AIRFLOW-7118:
-----------------------------------------
mik-laj commented on pull request #8090: [AIRFLOW-7118] Honor schema type for
MySQL to GCS data pre-process
URL: https://github.com/apache/airflow/pull/8090
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Honor schema type for mysql to gcs data pre-process
> ---------------------------------------------------
>
> Key: AIRFLOW-7118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7118
> Project: Apache Airflow
> Issue Type: Improvement
> Components: operators
> Affects Versions: 1.10.5, 1.10.6, 1.10.7, 1.10.8, 1.10.9
> Reporter: Hongyi Wang
> Assignee: Hongyi Wang
> Priority: Minor
>
> After AIRFLOW-7117 is done, we re-gain the ability to override (BigQuery)
> schema with customize
> [self.schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65](if
> set). However, we still pre-process sql value based on default schema rather
> than [override
> schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L112-L113].
>
> For example, [mysql_to_gcs default to map (mysql)`FIELD_TYPE.DATE` to
> (bigquery)`TIMESTAMP`|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L49].
> Suppose we want to load such field to BigQuery as `DATE`, we would provide
> [our own
> schema|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/sql_to_gcs.py#L61-L65].
> Unfortunately, since
> [convert_type()|https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/mysql_to_gcs.py#L102]
> still pre-process sql value based on default schema, (python)`date` will be
> converted
> timestamp|[https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L118-L119]].
> As a result, BigQuery API returns error:
> {code:java}
> Could not convert non-string JSON value to DATE type{code}
> To fix that, we need to update
> [convert_type()|https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L103]
> to always honor `schema_type` for data pre-preprocess. e.g.
> [DATE|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type]
> {code:java}
> if schema_type == "DATE" and isinstance(value, date):
> # In format of 'YYYY-[M]M-[D]D'
> return value.isoformat(){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)