[ https://issues.apache.org/jira/browse/SQOOP-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandish Kumar HN reassigned SQOOP-3182: --------------------------------------- Assignee: Sandish Kumar HN > Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with > (Can't parse input data: 'PAR1') > ------------------------------------------------------------------------------------------------------------ > > Key: SQOOP-3182 > URL: https://issues.apache.org/jira/browse/SQOOP-3182 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper > Assignee: Sandish Kumar HN > > Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with > (Can't parse input data: 'PAR1'). See test case below. > *Test Case* > {noformat} > ################# > # STEP 01 - Create Table and Data > ################# > export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g; > export MYUSER=sqoop > export MYPSWD=sqoop > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > Output: > ------------------------------------------------------------- > | C1 | C2 | C3 | C4 | > ------------------------------------------------------------- > | 1 | 2017-05-06 06:59:02.0 | NEW ROW 1 | 2017-05-06 > 06:59:02 | > ------------------------------------------------------------- > ################# > # STEP 02 - Import Data into HDFS > ################# > hdfs dfs -rm -r /user/root/t1 > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 > --merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile > --map-column-java C2=String,C4=String --num-mappers 1 --verbose > hdfs dfs -ls /user/root/t1/*.parquet > parquet-tools cat --json > 'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet' > Output: > 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in > 23.6174 seconds (113.8988 bytes/sec) > 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records. > 17/05/06 07:01:34 INFO tool.ImportTool: --last-value 2017-05-06 07:01:09.0 > ~~~~~ > -rw-r--r-- 3 root root 1144 2017-05-06 07:01 > /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet > ~~~~~ > {"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 > 06:59:02"} > ################# > # STEP 03 - Insert New Row and Update Existing Row > ################# > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1 order by c1" > Output: > ------------------------------------------------------------- > | C1 | C2 | C3 | C4 | > ------------------------------------------------------------- > | 1 | 2017-05-06 06:59:02.0 | UPDATE 1 | 2017-05-06 > 07:04:40 | > | 2 | 2017-05-06 07:04:38.0 | NEW ROW 2 | 2017-05-06 > 07:04:38 | > ------------------------------------------------------------- > ################# > # STEP 04 - Import Data into HDFS and Merge changes > ################# > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 > --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile > --map-column-java C2=String,C4=String --num-mappers 1 --verbose > Output: > 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in > 27.4934 seconds (99.1148 bytes/sec) > 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retrieved 2 records. > 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Restoring classloader: > java.net.FactoryURLClassLoader@121fdcee > 17/05/06 07:06:43 INFO tool.ImportTool: Final destination exists, will run > merge job. > 17/05/06 07:06:43 DEBUG tool.ImportTool: Using temporary folder: > 4bc6b65cd0194b81938f4660974ee392_T1 > 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Checking for existing class: T1 > 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Attempting to load jar through > URL: > jar:file:/tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar!/ > 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Previous classloader is > java.net.FactoryURLClassLoader@121fdcee > 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Testing class in jar: T1 > 17/05/06 07:06:43 ERROR tool.ImportTool: Import failed: java.io.IOException: > Could not load jar > /tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar into > JVM. (Could not find class T1.) > at > org.apache.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:92) > at > com.cloudera.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:36) > at org.apache.sqoop.tool.ImportTool.loadJars(ImportTool.java:120) > at > org.apache.sqoop.tool.ImportTool.lastModifiedMerge(ImportTool.java:456) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) > at org.apache.sqoop.Sqoop.run(Sqoop.java:147) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) > at org.apache.sqoop.Sqoop.main(Sqoop.java:252) > Caused by: java.lang.ClassNotFoundException: T1 > ################# > # STEP 05 - Generate Java Class (using codegen) > ################# > hdfs dfs -ls /user/root/t1/*.parquet > rm -rf /root/sqoop/jobs/* /root/sqoop/bindir/* > sqoop codegen --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --map-column-java C2=String,C4=String --class-name T1 --bindir > /root/sqoop/bindir/ > ls -l /root/sqoop/jobs/T1* /root/sqoop/bindir/T1* > Output: > -rw-r--r-- 3 root root 1144 2017-05-06 07:01 > /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet > ~~~~~ > -rw-r--r-- 1 root root 562 May 6 07:35 /root/sqoop/bindir/T1$1.class > -rw-r--r-- 1 root root 550 May 6 07:35 /root/sqoop/bindir/T1$2.class > -rw-r--r-- 1 root root 550 May 6 07:35 /root/sqoop/bindir/T1$3.class > -rw-r--r-- 1 root root 550 May 6 07:35 /root/sqoop/bindir/T1$4.class > -rw-r--r-- 1 root root 10891 May 6 07:35 /root/sqoop/bindir/T1.class > -rw-r--r-- 1 root root 207 May 6 07:35 > /root/sqoop/bindir/T1$FieldSetterCommand.class > -rw-r--r-- 1 root root 6563 May 6 07:35 /root/sqoop/bindir/T1.jar > -rw-r--r-- 1 root root 14170 May 6 07:35 /root/sqoop/jobs/T1.java > ################# > # STEP 06 - Import Data into HDFS and Merge changes > ################# > hdfs dfs -ls /user/root/t1/*.parquet > hdfs dfs -text /user/root/t1/*.parquet > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 > --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile > --map-column-java C2=String,C4=String --num-mappers 1 --verbose --jar-file > /root/sqoop/bindir/T1.jar --class-name T1 > Output: > -rw-r--r-- 3 root root 1144 2017-04-11 13:35 > /user/root/t1/54614794-b970-4e74-a0d7-669105f29a2b.parquet > ~~~~~ > PAR111 > (1>B2017-05-06 06:59:02.02017-05-06 06:59:02.0x2017-05-06 06:59:02.0&*, > NEW ROW 1 NEW ROW 1H NEW ROW 1:>2017-05-06 06:59:022017-05-06 > 06:59:02p2017-05-06 06:59:02\HT1 > <SNIP> > ~~~~~ > 17/05/06 07:44:31 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in > 25.6244 seconds (106.3442 bytes/sec) > 17/05/06 07:44:31 INFO mapreduce.ImportJobBase: Retrieved 2 records. > ~~~~~ > 17/05/06 07:44:57 INFO mapreduce.Job: Task Id : > attempt_1491569377351_0244_m_000001_0, Status : FAILED > Error: java.lang.RuntimeException: Can't parse input data: 'PAR1' > at T1.__loadFromFields(T1.java:365) > at T1.parse(T1.java:298) > at > org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53) > at > org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NumberFormatException > at java.math.BigDecimal.<init>(BigDecimal.java:470) > at java.math.BigDecimal.<init>(BigDecimal.java:739) > at T1.__loadFromFields(T1.java:347) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)