[jira] [Commented] (SQOOP-1600) Exception when import data using Data Connector for Oracle with TIMESTAMP column type to Parquet files

wu (JIRA) Tue, 18 Nov 2014 00:05:02 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215866#comment-14215866
 ]


wu commented on SQOOP-1600:
---------------------------

it seems off, here is what i done, 
I). sqoop scott.emp into hdfs
sqoop  import -Doraoop.timestamp.string=false --connect 
jdbc:oracle:thin:@10.1.10.61:1521:ORCL --username YY_YY --password 12345678 
--table SCOTT.EMP --target-dir /user/hive/tmpta  --as-parquetfile 
--fields-terminated-by '$'  -m  1

II). crate external table for impala, and HIREDATE column is mapped into bigint.
[node01.hd.com:21000] > CREATE EXTERNAL TABLE aaa (EMPNO varchar(200)  ,ENAME 
varchar(20) ,JOB varchar(20) ,MGR varchar(200) ,HIREDATE bigint ,SAL 
varchar(200),COMM varchar(200),DEPTNO varchar(200))ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '$' STORED AS PARQUET  LOCATION '/user/hive/tmpta';   
Query: create EXTERNAL TABLE aaa (EMPNO varchar(200)  ,ENAME varchar(20) ,JOB 
varchar(20) ,MGR varchar(200) ,HIREDATE bigint ,SAL varchar(200),COMM 
varchar(200),DEPTNO varchar(200))ROW FORMAT DELIMITED FIELDS TERMINATED BY '$' 
STORED AS PARQUET  LOCATION '/user/hive/tmpta'

Fetched 0 row(s) in 0.09s
[node01.hd.com:21000] > select * from aaa; 
Query: select * from aaa
+-------+--------+-----------+------+--------------+------+------+--------+
| empno | ename  | job       | mgr  | hiredate     | sal  | comm | deptno |
+-------+--------+-----------+------+--------------+------+------+--------+
| 7369  | SMITH  | CLERK     | 7902 | 345830400000 | 800  | NULL | 20     |
| 7499  | ALLEN  | SALESMAN  | 7698 | 351446400000 | 1600 | 300  | 30     |
| 7521  | WARD   | SALESMAN  | 7698 | 351619200000 | 1250 | 500  | 30     |
| 7566  | JONES  | MANAGER   | 7839 | 354988800000 | 2975 | NULL | 20     |
| 7654  | MARTIN | SALESMAN  | 7698 | 370454400000 | 1250 | 1400 | 30     |
| 7698  | BLAKE  | MANAGER   | 7839 | 357494400000 | 2850 | NULL | 30     |
| 7782  | CLARK  | MANAGER   | 7839 | 360864000000 | 2450 | NULL | 10     |
| 7788  | SCOTT  | ANALYST   | 7566 | 545756400000 | 3000 | NULL | 20     |
| 7839  | KING   | PRESIDENT | NULL | 374774400000 | 5000 | NULL | 10     |
| 7844  | TURNER | SALESMAN  | 7698 | 368726400000 | 1500 | 0    | 30     |
| 7876  | ADAMS  | CLERK     | 7788 | 548694000000 | 1100 | NULL | 20     |
| 7900  | JAMES  | CLERK     | 7698 | 376156800000 | 950  | NULL | 30     |
| 7902  | FORD   | ANALYST   | 7566 | 376156800000 | 3000 | NULL | 20     |
| 7934  | MILLER | CLERK     | 7782 | 380563200000 | 1300 | NULL | 10     |
+-------+--------+-----------+------+--------------+------+------+--------+
Fetched 14 row(s) in 1.21s
[node01.hd.com:21000] > select from_unixtime(hiredate,'yyyy-Mm-dd') from aaa; 
Query: select from_unixtime(hiredate,'yyyy-Mm-dd') from aaa
+---------------------------------------+
| from_unixtime(hiredate, 'yyyy-mm-dd') |
+---------------------------------------+
| 1904-850-29                           |
| 1946-722-10                           |
| 1951-1222-31                          |
| 1922-953-04                           |
| 2004-629-14                           |
| 2002-153-27                           |
| 1972-1025-01                          |
| 1979-510-14                           |
| 2005-30-31                            |
| 1949-929-11                           |
| 1936-541-09                           |
| 1912-1232-13                          |
| 1912-1232-13                          |
| 1916-64-25                            |
+---------------------------------------+
Fetched 14 row(s) in 0.12s

how can I map timestamp or date type into parquet files?


> Exception when import data using Data Connector for Oracle with TIMESTAMP 
> column type to Parquet files
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1600
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1600
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>         Environment: Hadoop version: 2.5.0-cdh5.2.0
> Sqoop: 1.4.5
>            Reporter: Daniel Lanza García
>              Labels: Connector, Oracle, Parquet, Timestamp
>             Fix For: 1.4.6
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A error is thrown in each mapper when a import job is run using Quest data 
> connector for Oracle (-direct argument), the source table has a column of the 
> type timestamp and the destination files are of Parquet format.
> The mapper's log show that the error is the following:
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : 
> org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 
> 2012-7-1 0:4:44. 403000000
> Which means the data obtained by the mapper (by the connector) is not of the 
> same type that the schema describe in this field. As we can read in the 
> error, the problem is related with the column UTC_STAMP (the unique column in 
> the source table that store a time stamp).
> If we check the generated schema for this column, we can observe that the 
> column is of the type long and SQL data type TIMESTAMP (93), which is correct.
> Schema: {"name" : "UTC_STAMP","type" : [ "long", "null" ],"columnName" : 
> "UTC_STAMP","sqlType" : "93"}
> If we debug the method where the exception is thrown 
> (org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:605)), we 
> can see that the problem comes when the type of the data obtained by the 
> mapper is of the type String which doesn't correspond with the type described 
> by the schema (long).
> The exception is not thrown when the destination files are text files. The 
> reason is that when you import to text files, a schema is not generated.
> Solution
> In the documentation, there is a section which describe how manage data and 
> timestamps when you use the Data Connector for Oracle and Hadoop. As we can 
> read in this section, this connector has a different way to manage this type 
> of data. However, this behavior can be disabled as describe this section with 
> the below parameter.
> -Doraoop.timestamp.string=false
> Although the problem is solved with this parameter (mandatory if you are in 
> this conditions), the software should deal with this types of column and 
> doesn't throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-1600) Exception when import data using Data Connector for Oracle with TIMESTAMP column type to Parquet files

Reply via email to