[
https://issues.apache.org/jira/browse/SQOOP-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liz Szilagyi reassigned SQOOP-3088:
-----------------------------------
Assignee: (was: Liz Szilagyi)
> Sqoop export with Parquet data failure does not contain the MapTask error
> -------------------------------------------------------------------------
>
> Key: SQOOP-3088
> URL: https://issues.apache.org/jira/browse/SQOOP-3088
> Project: Sqoop
> Issue Type: Bug
> Components: tools
> Reporter: Markus Kemper
> Priority: Major
>
> *Test Case*
> {noformat}
> #################
> # STEP 01 - Setup Table and Data
> #################
> export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "drop table t1_oracle"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "create table t1_oracle (c1 int, c2 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "insert into t1_oracle values (1, 'data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "select * from t1_oracle"
> Output:
> -------------------------------------
> | C1 | C2 |
> -------------------------------------
> | 1 | data |
> -------------------------------------
> #################
> # STEP 02 - Import Data as Parquet
> #################
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table
> T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir
> --num-mappers 1 --as-parquetfile
> Output:
> 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in
> 50.1693 seconds (33.1478 bytes/sec)
> 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> #################
> # STEP 03 - Verify Parquet Data
> #################
> hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
> parquet-tools schema -d
> hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
> Output:
> -rw-r--r-- 3 user1 user1 597 2016-12-21 07:11
> /user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
> ---
> message T1_ORACLE {
> optional binary C1 (UTF8);
> optional binary C2 (UTF8);
> }
> creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
> extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop
> import of
> T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"}
> file schema: T1_ORACLE
> ------------------------------------------------------------------------------------------------------------------------
> C1: OPTIONAL BINARY O:UTF8 R:0 D:1
> C2: OPTIONAL BINARY O:UTF8 R:0 D:1
> row group 1: RC:1 TS:85
> ------------------------------------------------------------------------------------------------------------------------
> C1: BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED
> C2: BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED
> #################
> # STEP 04 - Export Parquet Data
> #################
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table
> T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose
> Output:
> [sqoop debug]
> 16/12/21 07:15:06 INFO mapreduce.Job: map 0% reduce 0%
> 16/12/21 07:15:40 INFO mapreduce.Job: map 100% reduce 0%
> 16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with
> state FAILED due to: Task failed task_1481911879790_0026_m_000000
> Job failed as tasks failed. failedMaps:1 failedReduces:0
> 16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8
> Job Counters
> Failed map tasks=1
> Launched map tasks=1
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=32125
> Total time spent by all reduces in occupied slots (ms)=0
> Total time spent by all map tasks (ms)=32125
> Total vcore-seconds taken by all map tasks=32125
> Total megabyte-seconds taken by all map tasks=32896000
> 16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is
> deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
> 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
> 46.8304 seconds (0 bytes/sec)
> 16/12/21 07:15:40 WARN mapreduce.Counters: Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Exported 0 records.
> 16/12/21 07:15:40 DEBUG util.ClassLoaderStack: Restoring classloader:
> java.net.FactoryURLClassLoader@577cfae6
> 16/12/21 07:15:40 ERROR tool.ExportTool: Error during export: Export job
> failed!
> [yarn debug]
> 2016-12-21 07:15:38,911 DEBUG [Thread-11]
> org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Committing transaction of 0
> statements
> 2016-12-21 07:15:38,914 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : parquet.io.ParquetDecodingException: Can not read
> value at 1 in block 0 in file
> hdfs://nameservice1/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
> at
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:241)
> at
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
> at
> org.kitesdk.data.spi.filesystem.AbstractCombineFileRecordReader.nextKeyValue(AbstractCombineFileRecordReader.java:68)
> at
> org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
> at
> org.kitesdk.data.spi.AbstractKeyRecordReaderWrapper.nextKeyValue(AbstractKeyRecordReaderWrapper.java:55)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
> at
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at
> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ClassCastException: T1_ORACLE cannot be cast to
> org.apache.avro.generic.IndexedRecord
> at
> parquet.avro.AvroIndexedRecordConverter.start(AvroIndexedRecordConverter.java:185)
> at
> parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:391)
> at
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:216)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)