[ https://issues.apache.org/jira/browse/SQOOP-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nate Clevenger updated SQOOP-3003: ---------------------------------- Attachment: SQOOP-3003.patch Attaching patch with proposed change to OracleManager. Compiled and verified that a sqoop import run with a timestamp boundary-query produced the expected input splits according to the map task logs: {code} 2016-08-29 15:41:39,458 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: LAST_UTC_TS >= TO_TIMESTAMP('2016-08-28 09:00:00.0', 'YYYY-MM-DD HH24:MI:SS.FF') AND LAST_UTC_TS < TO_TIMESTAMP('2016-08-28 12:00:00.0', 'YYYY-MM-DD HH24:MI:SS.FF') {code} > Sqoop import fails to query with split-by/boundary-query using Oracle > Date/Timestamp > ------------------------------------------------------------------------------------ > > Key: SQOOP-3003 > URL: https://issues.apache.org/jira/browse/SQOOP-3003 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle > Reporter: Nate Clevenger > Attachments: SQOOP-3003.patch > > > Given the following example sqoop import command intended to import data from > an Oracle test_table, split-by a timestamp_column using a boundary query > (e.g. one-day range) with sqoop parallelism of eight: > {code} > sqoop import --connect jdbc:oracle:... --username <username> --password <pwd> > --target-dir /tmp/sqoop/test -m 8 --null-string '' --append --query "SELECT > primary_key, TO_CHAR(timestamp_column) FROM test_table WHERE primary_key != > 0 AND \$CONDITIONS" --split-by "timestamp_column" --boundary-query "SELECT > TO_TIMESTAMP('1970-01-01', > 'yyyy-mm-dd')+numtodsinterval(1472083200,'second'), > TO_TIMESTAMP('1970-01-01', 'yyyy-mm-dd')+numtodsinterval(1472169600,'second') > FROM DUAL" > {code} > The following exception is thrown by each map tasks: > {code} > Caused by: java.sql.SQLDataException: ORA-01843: not a valid month > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447) > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) > at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951) > at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513) > at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227) > at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) > at > oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:208) > at > oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:886) > at > oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175) > at > oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296) > at > oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3613) > at > oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3657) > at > oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1495) > at > org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111) > at > org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235) > ... 12 more > {code} > Inspecting the source code, the issue appears to be attributed to > OracleManager failing to set the correct input format (should be > [OracleDataDrivenDBInputFormat|https://github.com/apache/sqoop/blob/release-1.4.6-rc3/src/java/org/apache/sqoop/mapreduce/db/OracleDataDrivenDBInputFormat.java#L47], > but appears to be getting set to > [DataDrivenDBInputFormat|https://github.com/apache/sqoop/blob/release-1.4.6-rc3/src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBInputFormat.java#L79], > resulting in > [DateSplitter|https://github.com/apache/sqoop/blob/release-1.4.6-rc3/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java#L180] > being applied instead of > [OracleDateSplitter|https://github.com/apache/sqoop/blob/release-1.4.6-rc3/src/java/org/apache/sqoop/mapreduce/db/OracleDateSplitter.java#L30]). > OracleManager appears to apply the correct input format when using the > [\--table > option|https://github.com/apache/sqoop/blob/release-1.4.6-rc3/src/java/org/apache/sqoop/manager/OracleManager.java#L437] > in {{sqoop import}}, but doesn't apply a similar override when using the > [\--query > option|https://github.com/apache/sqoop/blob/release-1.4.6-rc3/src/java/org/apache/sqoop/manager/SqlManager.java#L676], > resulting in the input format being [defaulted to > DataDriveDBInputFormat|https://github.com/apache/sqoop/blob/a0b730c77e297a62909063289ef37a2b993ff5e1/src/java/org/apache/sqoop/manager/ImportJobContext.java#L42]. > This defect was tested using 1.4.6-cdh5.5.2-release and > 1.4.6-cdh5.6.0-release, and the affected code issue appears to still be > applicable as of the latest in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)