[ https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586736#comment-15586736 ]
Attila Szabo commented on SQOOP-3030: ------------------------------------- Hi [~markuskem...@me.com], IMHO this issue is connected to [SQOOP-3029], or a subset of that (as I think it has no connection to hcatalog). Can you confirm this? If yes I think we could provide a quite straightforward solution for that. Thanks, [~maugli] > Export to Oracle using (--direct + --hcatalog + --columns) with columns > having the wrong case sensitivity fails with NPE > ------------------------------------------------------------------------------------------------------------------------ > > Key: SQOOP-3030 > URL: https://issues.apache.org/jira/browse/SQOOP-3030 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle, hive-integration > Reporter: Markus Kemper > > It appears that if the case of the columns in the Oracle RDBMS are not used > with the Sqoop --columns option with (export + --hcatalog + --direct) the > Sqoop map task will fail with an NPE where the non-direct path fails correct. > My test case is below, please review and let me know if you have any > questions. > Additionally it would be nice if we could detect this column case mis-match > during compile time and not submit the job only to have it fail in YARN. > TEST CASE: > STEP 01 : Create Oracle Source/Target Table > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 varchar(10))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, 'some data')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > ------------------------------------- > | C1 | C2 | > ------------------------------------- > | 1 | some data | > ------------------------------------- > STEP 02 : Import Oracle Table using Sqoop --hcatalog options > beeline -u jdbc:hive2:// -e "use default; drop table t1_text;" > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text > --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' > beeline -u jdbc:hive2:// -e "use default; select * from t1_text;" > Output: > <SNIP> > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table > default.t1_text for import > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table > statement: > create table `default`.`t1_text` ( > `c1` decimal(38), > `c2` varchar(10)) > stored as textfile > <SNIP> > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in > 34.4222 seconds (0.3486 bytes/sec) > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records. > --- > +-------------+-------------+--+ > | t1_text.c1 | t1_text.c2 | > +-------------+-------------+--+ > | 1 | some data | > +-------------+-------------+--+ > STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns > options > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "c1,c2" > Output: (failure = correct) > <SNIP> > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > <SNIP> > 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore. > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema > fields = [c1, c2] > 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning > key fields = [] > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema > fields = [c1, c2] > <SNIP> > 16/10/18 09:14:22 INFO mapreduce.ExportJobBase: Exported 0 records. > 16/10/18 09:14:22 ERROR tool.ExportTool: Error during export: Export job > failed! > --- > 2016-10-18 09:14:19,418 INFO [main] > org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties > {transient_lastDdlTime=1476807097, name=default.t1_text, > serialization.null.format=\N, columns=c1,c2, > serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > serialization.format=1, columns.types=decimal(38,0),varchar(10)} > 2016-10-18 09:14:19,660 INFO [Thread-12] > org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is > finished. keepGoing=false > 2016-10-18 09:14:19,952 ERROR [Thread-11] > org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update > thread: java.sql.SQLSyntaxErrorException: ORA-00904: "c2": invalid identifier > STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns > options and Oracle --direct > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "c1,c2" --direct > Output: (failure = not correct (NPE)) > <SNIP> > 16/10/18 09:17:47 INFO oracle.OraOopManagerFactory: > ************************************************** > *** Using Data Connector for Oracle and Hadoop *** > ************************************************** > 16/10/18 09:17:47 INFO oracle.OraOopManagerFactory: Oracle Database version: > Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production > 16/10/18 09:17:47 INFO oracle.OraOopManagerFactory: This Oracle database is > not a RAC. > 16/10/18 09:17:47 INFO Configuration.deprecation: mapred.map.max.attempts is > deprecated. Instead, use mapreduce.map.maxattempts > 16/10/18 09:17:47 INFO tool.CodeGenTool: Beginning code generation > 16/10/18 09:17:47 INFO manager.SqlManager: Executing SQL statement: SELECT > "C1","C2" FROM T2 WHERE 0=1 > <SNIP> > 16/10/18 09:17:50 INFO mapreduce.ExportJobBase: Configuring HCatalog for > export job > 16/10/18 09:17:50 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific > details for job > 16/10/18 09:17:50 INFO manager.SqlManager: Executing SQL statement: SELECT > "C1","C2" FROM "T2" WHERE 1=0 > 16/10/18 09:17:50 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:17:50 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > <SNIP> > 16/10/18 09:17:51 INFO hive.metastore: Connected to metastore. > 16/10/18 09:17:52 INFO hcat.SqoopHCatUtilities: HCatalog full table schema > fields = [c1, c2] > <SNIP> > 16/10/18 09:18:25 INFO mapreduce.ExportJobBase: Exported 0 records. > 16/10/18 09:18:25 ERROR tool.ExportTool: Error during export: Export job > failed! > --- > 2016-10-18 09:18:23,561 INFO [main] > org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: The number of rows > per batch is: 100 > 2016-10-18 09:18:23,561 INFO [main] > org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: The number of batches > per commit is: 100 > 2016-10-18 09:18:23,721 INFO [main] > org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: This record writer is > connected to Oracle via the JDBC URL: > "oracle.jdbc.driver.T4CConnection@34133979" > to the Oracle instance: "ORCL" > 2016-10-18 09:18:23,828 INFO [main] > org.apache.sqoop.manager.oracle.OraOopOracleQueries: Session Time Zone set to > GMT > 2016-10-18 09:18:23,883 INFO [main] > org.apache.sqoop.manager.oracle.OracleConnectionFactory: Initializing Oracle > session with SQL : > begin > dbms_application_info.set_module(module_name => 'Data Connector for Oracle > and Hadoop', action_name => 'export 20161018091747PDT'); > end; > 2016-10-18 09:18:23,883 WARN [main] > org.apache.sqoop.manager.oracle.OracleConnectionFactory: No Oracle 'session > initialization' statements were found to execute. > Check that your oraoop-site-template.xml and/or oraoop-site.xml files are > correctly installed in the ${SQOOP_HOME}/conf directory. > 2016-10-18 09:18:24,116 INFO [main] > org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties > {transient_lastDdlTime=1476807097, name=default.t1_text, > serialization.null.format=\N, columns=c1,c2, > serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > serialization.format=1, columns.types=decimal(38,0),varchar(10)} > 2016-10-18 09:18:24,345 INFO [Thread-12] > org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is > finished. keepGoing=false > 2016-10-18 09:18:24,357 INFO [main] > org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: Batch-Mode insert > statement: > insert into "SQOOP"."T2" > ("C1" > ,"C2") > values > (:C1 > ,:C2) > 2016-10-18 09:18:24,358 ERROR [main] > org.apache.sqoop.manager.oracle.OraOopOutputFormatInsert: The following error > occurred during configurePreparedStatement() > java.lang.NullPointerException > at > org.apache.sqoop.manager.oracle.OraOopOutputFormatBase$OraOopDBRecordWriterBase.setBindValueAtName(OraOopOutputFormatBase.java:432) > STEP 04 : Verify Export without --direct and correct column case works > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "C1,C2" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > Output: > 16/10/18 09:26:46 INFO mapreduce.ExportJobBase: Transferred 7.79 KB in > 30.9303 seconds (257.9025 bytes/sec) > 16/10/18 09:26:46 INFO mapreduce.ExportJobBase: Exported 1 records. > --- > ------------------------------------- > | C1 | C2 | > ------------------------------------- > | 1 | some data | > | 1 | some data | > ------------------------------------- > STEP 05 : Verify Export with --direct and correct column case works > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "C1,C2" --direct > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > Output: > <SNIP> > 16/10/18 09:30:40 INFO oracle.OraOopManagerFactory: > ************************************************** > *** Using Data Connector for Oracle and Hadoop *** > ************************************************** > <SNIP> > 16/10/18 09:31:16 INFO mapreduce.ExportJobBase: Transferred 7.79 KB in > 30.9987 seconds (257.333 bytes/sec) > 16/10/18 09:31:16 INFO mapreduce.ExportJobBase: Exported 1 records. > --- > ------------------------------------- > | C1 | C2 | > ------------------------------------- > | 1 | some data | > | 1 | some data | > | 1 | some data | > ------------------------------------- -- This message was sent by Atlassian JIRA (v6.3.4#6332)