[ 
https://issues.apache.org/jira/browse/SQOOP-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andre Araujo updated SQOOP-1946:
--------------------------------
    Description: 
When running a sqoop import job with a split-by column of data type DATE, the 
DateSplitter relies on the database implicit string-to-date convertion when 
generating the lower and upper bound clauses for the splits. The splits are 
generated with clauses similar to the following:

{code}
date_col >= '2013-08-26 00:00:00.0'
date_col <= '2013-08-26 00:00:00.0'
{code}

This forces the database to either implcitly convert the date_col to string or 
implicitly cast the literal string to a date type. In case the database default 
date format is not the expected, this could lead to either exceptions, as in 
the example below, or to unexpected behavior (string comparison of to dates in 
different string formats).

For Oracle databases, for example, we may see the following errors when the 
database cannot implicitly convert the string above to date:

{code}
2014-12-23 12:38:25,690 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= 
'2013-08-26 00:00:00.0'

2014-12-23 12:38:25,745 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: sales_date >= 
'2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'

2014-12-23 12:38:25,860 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: select * from 
schema.TABLE where ( date_col >= '2013-08-26 00:00:00.0' ) AND ( date_col <= 
'2013-08-26 00:00:00.0' )

2014-12-23 12:38:25,909 ERROR [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Top level exception:  

java.sql.SQLDataException: ORA-01861: literal does not match format string 
{code}


A workaround for that is to set the default date format for the database user 
used by Sqoop to the "expected" format using a logon trigger in Oracle, like 
the one below:

{code}
CREATE OR REPLACE TRIGGER tr_a_l_set_date_format
   AFTER LOGON ON DATABASE WHEN (USER = 'SQOOP_USER')
   BEGIN
     execute immediate 'alter session set nls_date_format="yyyy-mm-dd 
hh24:mi:ss"';
   END;
{code}


A better form, though, would be to explicitly convert the date string literal 
to a DATE using a specific format, in the exact same way that 
OracleManager.datetimeToQueryString() does.


  was:
When running a sqoop import job with a split-by column of data type DATE, the 
DateSplitter relies on the database implicit string-to-date convertion when 
generating the lower and upper bound clauses for the splits. The splits are 
generated with clauses similar to the following:

date_col >= '2013-08-26 00:00:00.0'
date_col <= '2013-08-26 00:00:00.0'

This forces the database to either implcitly convert the date_col to string or 
implicitly cast the literal string to a date type. In case the database default 
date format is not the expected, this could lead to either exceptions, as in 
the example below, or to unexpected behavior (string comparison of to dates in 
different string formats).

For Oracle databases, for example, we may see the following errors when the 
database cannot implicitly convert the string above to date:

{code}
2014-12-23 12:38:25,690 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= 
'2013-08-26 00:00:00.0'

2014-12-23 12:38:25,745 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: sales_date >= 
'2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'

2014-12-23 12:38:25,860 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: select * from 
schema.TABLE where ( date_col >= '2013-08-26 00:00:00.0' ) AND ( date_col <= 
'2013-08-26 00:00:00.0' )

2014-12-23 12:38:25,909 ERROR [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Top level exception:  

java.sql.SQLDataException: ORA-01861: literal does not match format string 
{code}


A workaround for that is to set the default date format for the database user 
used by Sqoop to the "expected" format using a logon trigger in Oracle, like 
the one below:

{code}
CREATE OR REPLACE TRIGGER tr_a_l_set_date_format
   AFTER LOGON ON DATABASE WHEN (USER = 'SQOOP_USER')
   BEGIN
     execute immediate 'alter session set nls_date_format="yyyy-mm-dd 
hh24:mi:ss.ff"';
   END;
{code}


A better form, though, would be to explicitly convert the date string literal 
to a DATE using a specific format, in the exact same way that 
OracleManager.datetimeToQueryString() does.



> DateSplitter relies on database string-to-date conversion when creating 
> splits based on date columns
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1946
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1946
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/generic
>    Affects Versions: 1.4.5
>            Reporter: Andre Araujo
>
> When running a sqoop import job with a split-by column of data type DATE, the 
> DateSplitter relies on the database implicit string-to-date convertion when 
> generating the lower and upper bound clauses for the splits. The splits are 
> generated with clauses similar to the following:
> {code}
> date_col >= '2013-08-26 00:00:00.0'
> date_col <= '2013-08-26 00:00:00.0'
> {code}
> This forces the database to either implcitly convert the date_col to string 
> or implicitly cast the literal string to a date type. In case the database 
> default date format is not the expected, this could lead to either 
> exceptions, as in the example below, or to unexpected behavior (string 
> comparison of to dates in different string formats).
> For Oracle databases, for example, we may see the following errors when the 
> database cannot implicitly convert the string above to date:
> {code}
> 2014-12-23 12:38:25,690 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Processing split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= 
> '2013-08-26 00:00:00.0'
> 2014-12-23 12:38:25,745 INFO [main] 
> org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: sales_date >= 
> '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
> 2014-12-23 12:38:25,860 INFO [main] 
> org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: select * from 
> schema.TABLE where ( date_col >= '2013-08-26 00:00:00.0' ) AND ( date_col <= 
> '2013-08-26 00:00:00.0' )
> 2014-12-23 12:38:25,909 ERROR [main] 
> org.apache.sqoop.mapreduce.db.DBRecordReader: Top level exception:  
> java.sql.SQLDataException: ORA-01861: literal does not match format string 
> {code}
> A workaround for that is to set the default date format for the database user 
> used by Sqoop to the "expected" format using a logon trigger in Oracle, like 
> the one below:
> {code}
> CREATE OR REPLACE TRIGGER tr_a_l_set_date_format
>    AFTER LOGON ON DATABASE WHEN (USER = 'SQOOP_USER')
>    BEGIN
>      execute immediate 'alter session set nls_date_format="yyyy-mm-dd 
> hh24:mi:ss"';
>    END;
> {code}
> A better form, though, would be to explicitly convert the date string literal 
> to a DATE using a specific format, in the exact same way that 
> OracleManager.datetimeToQueryString() does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to