[jira] [Updated] (SQOOP-1600) Exception when import data using Data Connector for Oracle with TIMESTAMP column type to Parquet files

Qian Xu (JIRA) Wed, 27 May 2015 02:23:52 -0700

     [ 
https://issues.apache.org/jira/browse/SQOOP-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Qian Xu updated SQOOP-1600:
---------------------------
    Description: 
A error is thrown in each mapper when a import job is run using Quest data 
connector for Oracle (-direct argument), the source table has a column of the 
type timestamp and the destination files are of Parquet format.

The mapper's log show that the error is the following:
{code}
WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : 
org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 
2012-7-1 0:4:44. 403000000
{code}
Which means the data obtained by the mapper (by the connector) is not of the 
same type that the schema describe in this field. As we can read in the error, 
the problem is related with the column UTC_STAMP (the unique column in the 
source table that store a time stamp).
If we check the generated schema for this column, we can observe that the 
column is of the type long and SQL data type TIMESTAMP (93), which is correct.
{code}
Schema: {"name" : "UTC_STAMP","type" : [ "long", "null" ],"columnName" : 
"UTC_STAMP","sqlType" : "93"}
{code}
If we debug the method where the exception is thrown 
{{org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:605)}}, we 
can see that the problem comes when the type of the data obtained by the mapper 
is of the type String which doesn't correspond with the type described by the 
schema (long). The exception is not thrown when the destination files are text 
files. The reason is that when you import to text files, a schema is not 
generated.

Solution:

In the documentation, there is a section which describe how manage data and 
timestamps when you use the Data Connector for Oracle and Hadoop. As we can 
read in this section, this connector has a different way to manage this type of 
data. However, this behavior can be disabled as describe this section with the 
below parameter.
-Doraoop.timestamp.string=false

Although the problem is solved with this parameter (mandatory if you are in 
this conditions), the software should deal with this types of column and 
doesn't throw an exception.


  was:
A error is thrown in each mapper when a import job is run using Quest data 
connector for Oracle (-direct argument), the source table has a column of the 
type timestamp and the destination files are of Parquet format.

The mapper's log show that the error is the following:
WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : 
org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 
2012-7-1 0:4:44. 403000000
Which means the data obtained by the mapper (by the connector) is not of the 
same type that the schema describe in this field. As we can read in the error, 
the problem is related with the column UTC_STAMP (the unique column in the 
source table that store a time stamp).
If we check the generated schema for this column, we can observe that the 
column is of the type long and SQL data type TIMESTAMP (93), which is correct.
Schema: {"name" : "UTC_STAMP","type" : [ "long", "null" ],"columnName" : 
"UTC_STAMP","sqlType" : "93"}
If we debug the method where the exception is thrown 
(org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:605)), we 
can see that the problem comes when the type of the data obtained by the mapper 
is of the type String which doesn't correspond with the type described by the 
schema (long).
The exception is not thrown when the destination files are text files. The 
reason is that when you import to text files, a schema is not generated.
Solution

In the documentation, there is a section which describe how manage data and 
timestamps when you use the Data Connector for Oracle and Hadoop. As we can 
read in this section, this connector has a different way to manage this type of 
data. However, this behavior can be disabled as describe this section with the 
below parameter.
-Doraoop.timestamp.string=false

Although the problem is solved with this parameter (mandatory if you are in 
this conditions), the software should deal with this types of column and 
doesn't throw an exception.



> Exception when import data using Data Connector for Oracle with TIMESTAMP 
> column type to Parquet files
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1600
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1600
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>         Environment: Hadoop version: 2.5.0-cdh5.2.0
> Sqoop: 1.4.5
>            Reporter: Daniel Lanza García
>            Assignee: Qian Xu
>              Labels: Connector, Oracle, Parquet, Timestamp
>             Fix For: 1.4.7
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A error is thrown in each mapper when a import job is run using Quest data 
> connector for Oracle (-direct argument), the source table has a column of the 
> type timestamp and the destination files are of Parquet format.
> The mapper's log show that the error is the following:
> {code}
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : 
> org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 
> 2012-7-1 0:4:44. 403000000
> {code}
> Which means the data obtained by the mapper (by the connector) is not of the 
> same type that the schema describe in this field. As we can read in the 
> error, the problem is related with the column UTC_STAMP (the unique column in 
> the source table that store a time stamp).
> If we check the generated schema for this column, we can observe that the 
> column is of the type long and SQL data type TIMESTAMP (93), which is correct.
> {code}
> Schema: {"name" : "UTC_STAMP","type" : [ "long", "null" ],"columnName" : 
> "UTC_STAMP","sqlType" : "93"}
> {code}
> If we debug the method where the exception is thrown 
> {{org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:605)}}, 
> we can see that the problem comes when the type of the data obtained by the 
> mapper is of the type String which doesn't correspond with the type described 
> by the schema (long). The exception is not thrown when the destination files 
> are text files. The reason is that when you import to text files, a schema is 
> not generated.
> Solution:
> In the documentation, there is a section which describe how manage data and 
> timestamps when you use the Data Connector for Oracle and Hadoop. As we can 
> read in this section, this connector has a different way to manage this type 
> of data. However, this behavior can be disabled as describe this section with 
> the below parameter.
> -Doraoop.timestamp.string=false
> Although the problem is solved with this parameter (mandatory if you are in 
> this conditions), the software should deal with this types of column and 
> doesn't throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SQOOP-1600) Exception when import data using Data Connector for Oracle with TIMESTAMP column type to Parquet files

Reply via email to