[ 
https://issues.apache.org/jira/browse/SQOOP-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Benes updated SQOOP-3014:
-------------------------------
    Attachment: oracle-sqoop-error.png

> Sqoop with HCatalog import loose precision for large numbers that does not 
> fit into double
> ------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-3014
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3014
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.6
>            Reporter: Pavel Benes
>            Priority: Critical
>         Attachments: oracle-sqoop-error.png
>
>
> When using sqoop with HCatalog to import data from JDBC (I have tried 
> Oracle11) all numbers that does not fit into double are loosing its precision 
> or are distorted.
> Steps to reproduce:
> 1) Create test table in Oracle and fill it with test data
> {code}
> CREATE TABLE TEST_SQOOP_ERROR(ID VARCHAR(10), TYPE_NUMBER DECIMAL(22,5))
>  
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-1', 
> 454018528782.42006329)
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-2', 
> 87658675864540185.123456789123456789) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-3', 
> 87658675864540185.12345)
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-4', 
> 87658675864540185.123)
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-5', 
> 7658675864540185.12345) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-6', 
> 7658675864540185.123456789) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-7', 
> 658675864540185.12345)
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-8', 
> 58675864540185.12345)
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-9', 
> 8675864540185.12345) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-10', 
> 675864540185.12345) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-11', 
> 75864540185.12345) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-12', 
> 35864540185.12345) 
> INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-13', 
> 5864540185.12345) 
> {code}
> 2) Create table in Hive database
> {code}
> CREATE TABLE pbe_test_sqoop_error(id string, type_number decimal(22,5)) 
> STORED AS ORC;
> {code}
> 3) Import data from Oracle to Hive using sqoop
> {code}
> export HADOOP_CLASSPATH=/opt/mantis/jdbc/oracle-11.2/ojdbc6.jar
> sqoop import -connect jdbc:oracle:thin:@//1.1.1.1:1521/XE --username  XXX 
> --password  XXX --hcatalog-database  default --hcatalog-table 
> pbe_test_sqoop_error  --driver oracle.jdbc.OracleDriver  --query 'SELECT id, 
> type_number FROM MMDINGEST.TEST_SQOOP_ERROR WHERE $CONDITIONS' -m 1
> {code}
> 4) Display data from Hive table
> {code}
> hive> select * from pbe_test_sqoop_error;
> OK
> row-1   454018528782.42004
> row-2   87658675864540192
> row-3   87658675864540192
> row-4   87658675864540192
> row-5   7658675864540185
> row-6   7658675864540185
> row-7   658675864540185.125
> row-8   58675864540185.125
> row-9   8675864540185.12305
> row-10  675864540185.12341
> row-11  75864540185.12344
> row-12  35864540185.12345
> row-13  5864540185.12345
> Time taken: 0.455 seconds, Fetched: 13 row(s)
> {code}
> Only the values at line 1, 12, 13 are correct. At the lines 2-4 even the part 
> of the number before dot is wrong. All looks correctly in Oracle as can be 
> seen on the attached screenshot.
> The problem seems to be in the java class 
> https://www.codatlas.com/github.com/apache/sqoop/branch-1.4.6/src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportHelper.java
>  starting at line 437.
> {code}
>     } else if (hfsType == HCatFieldSchema.Type.DECIMAL) {
>       BigDecimal bd = new BigDecimal(n.doubleValue(),
>         MathContext.DECIMAL128);
>       return HiveDecimal.create(bd);
>     }
> {code}
> all numbers, even those that are stored in BigDecimal are squeezed through 
> double which leads to the precision lost The same issue could be at some 
> places when working with large numbers.
> The following code fixes this issue:
> {code}
> } else if (hfsType == HCatFieldSchema.Type.DECIMAL) {
>       BigDecimal bd = val instanceof BigDecimal ?  (BigDecimal) val : new 
> BigDecimal(n.doubleValue(), MathContext.DECIMAL128);
>       return HiveDecimal.create(bd);
>     }
> {code} . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to