[ 
https://issues.apache.org/jira/browse/AIRFLOW-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499414#comment-16499414
 ] 

ASF subversion and git services commented on AIRFLOW-2500:
----------------------------------------------------------

Commit b7dc315101a0fc924f76e5c5f500c96e85bdd672 in incubator-airflow's branch 
refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b7dc315 ]

[AIRFLOW-2500] Fix MySqlToHiveTransfer to transfer unsigned type properly

MySQL supports unsigned data types, but Hive
doesn't.
So if MySqlToHiveTransfer maps MySQL's data types
to
Hive's corresponding ones directly (e.g. INT ->
INT),
unsigned values over signed type's upper bound
transferred from MySQL are interpreted as invalid
by Hive, and users get NULL.
To avoid it, this PR fixes MySqlToHiveTransfer
to map MySQL data types to Hive's wider ones
(e.g. SMALLINT -> INT, INT -> BIGINT, etc.).

Closes #3446 from sekikn/AIRFLOW-2500


> Fix MySqlToHiveTransfer to transfer unsigned type properly
> ----------------------------------------------------------
>
>                 Key: AIRFLOW-2500
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2500
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>
> Given the following table,
> {code}
> mysql> USE airflow_ci
> Database changed
> mysql> DESC users;
> +-------+------------------+------+-----+---------+-------+
> | Field | Type             | Null | Key | Default | Extra |
> +-------+------------------+------+-----+---------+-------+
> | id    | int(10) unsigned | YES  |     | NULL    |       |
> +-------+------------------+------+-----+---------+-------+
> 1 row in set (0.00 sec)
> mysql> SELECT * FROM users;
> +------------+
> | id         |
> +------------+
> | 2147483647 |
> | 2147483648 |
> +------------+
> 2 rows in set (0.00 sec)
> {code}
> executing MySqlToHiveTransfer:
> {code}
> In [1]: from airflow.operators.mysql_to_hive import MySqlToHiveTransfer
>    ...: t = MySqlToHiveTransfer(sql="SELECT * FROM airflow_ci.users", 
> hive_table="users", recreate=True, task_id="t")
>    ...: t.execute(None)
>    ...: 
> [2018-05-21 12:14:09,137] {base_hook.py:83} INFO - Using connection to: 
> localhost
> [2018-05-21 12:14:09,140] {base_hook.py:83} INFO - Using connection to: 
> localhost
> [2018-05-21 12:14:09,146] {hive_hooks.py:427} INFO - DROP TABLE IF EXISTS 
> users;
> CREATE TABLE IF NOT EXISTS users (
> id INT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ''
> STORED AS textfile
> ;
> (snip)
> [2018-05-21 12:14:31,667] {hive_hooks.py:233} INFO - Loading data to table 
> default.users
> [2018-05-21 12:14:32,364] {hive_hooks.py:233} INFO - Table default.users 
> stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
> [2018-05-21 12:14:32,365] {hive_hooks.py:233} INFO - OK
> [2018-05-21 12:14:32,366] {hive_hooks.py:233} INFO - Time taken: 1.299 seconds
> {code}
> ... then the value greater than the upper bound for signed integer is not 
> properly fetched from Hive.
> {code}
> hive> SELECT * FROM users;
> OK
> 2147483647
> NULL
> Time taken: 2.461 seconds, Fetched: 2 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to