Luca, What are you reader and writer schemas? It looks like they may not match because the reader expects and Integer but is deserializing a Contig object.
rb On Fri, Mar 4, 2016 at 3:58 AM, Luca Pireddu <[email protected]> wrote: > Hello all, > > I'm using AvroParquetOutputFormat and AvroParquetInputFormat for a > pair of Hadoop applications -- one that writes avro-parquet and one > that reads. Actually, I'm using Pydoop ( > https://github.com/crs4/pydoop) but the actual I/O is done through the > AvroParquet classes. > > The writer seems to succeed. Instead, the reader, when processing the > other application's result, crashes with a ParquetDecodingException. > Here's the syslog output with the stack trace: > > > 2016-03-04 12:46:50,075 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: ParquetInputSplit{part: > > hdfs://localhost:9000/user/pireddu/seqal_mini_ref_bwamem_avo_output/tmp/part-m-00000.parquet > start: 0 end: 16916 length: 16916 hosts: []} > 2016-03-04 12:46:50,846 WARN [main] > org.apache.hadoop.mapred.YarnChild: Exception running child : > org.apache.parquet.io.ParquetDecodingException: Can not read value at > 1 in block 0 in file > > hdfs://localhost:9000/user/pireddu/seqal_mini_ref_bwamem_avo_output/tmp/part-m-00000.parquet > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243) > at > org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227) > at > it.crs4.pydoop.mapreduce.pipes.PydoopAvroBridgeReaderBase.initialize(PydoopAvroBridgeReaderBase.java:66) > at > it.crs4.pydoop.mapreduce.pipes.PydoopAvroBridgeValueReader.initialize(PydoopAvroBridgeValueReader.java:38) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ClassCastException: > org.bdgenomics.formats.avro.Contig cannot be cast to java.lang.Integer > at > org.bdgenomics.formats.avro.AlignmentRecord.put(AlignmentRecord.java:258) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.set(AvroIndexedRecordConverter.java:168) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.access$000(AvroIndexedRecordConverter.java:46) > at > org.apache.parquet.avro.AvroIndexedRecordConverter$1.add(AvroIndexedRecordConverter.java:95) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.end(AvroIndexedRecordConverter.java:189) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:413) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:218) > ... 11 more > > > I'm using parquet 1.8.1 and avro 1.7.6. I'm able to read the parquet > file with parquet-tools-1.8.1, so I'm inclined to think that the file > is valid. > > Contig is the first class defined in my avro schema: > > file schema: > org.bdgenomics.formats.avro.AlignmentRecord > > -------------------------------------------------------------------------------- > contig: OPTIONAL F:6 > .contigName: OPTIONAL BINARY O:UTF8 R:0 D:2 > .contigLength: OPTIONAL INT64 R:0 D:2 > ...and so on. > > Can someone suggest what might be causing the problem when reading? > Any help would be appreciated! > > Thanks, > > Luca > -- Ryan Blue Software Engineer Netflix
