Re: org.apache.parquet.io.ParquetDecodingException with AvroParquetInputFormat

Ryan Blue Fri, 04 Mar 2016 09:43:45 -0800

Luca,

What are you reader and writer schemas? It looks like they may not match
because the reader expects and Integer but is deserializing a Contig object.


rb

On Fri, Mar 4, 2016 at 3:58 AM, Luca Pireddu <[email protected]> wrote:

> Hello all,
>
> I'm using AvroParquetOutputFormat and AvroParquetInputFormat for a
> pair of Hadoop applications -- one that writes avro-parquet and one
> that reads.  Actually, I'm using Pydoop (
> https://github.com/crs4/pydoop) but the actual I/O is done through the
> AvroParquet classes.
>
> The writer seems to succeed.  Instead, the reader, when processing the
> other application's result, crashes with a ParquetDecodingException.
> Here's the syslog output with the stack trace:
>
>
> 2016-03-04 12:46:50,075 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: ParquetInputSplit{part:
>
> hdfs://localhost:9000/user/pireddu/seqal_mini_ref_bwamem_avo_output/tmp/part-m-00000.parquet
> start: 0 end: 16916 length: 16916 hosts: []}
> 2016-03-04 12:46:50,846 WARN [main]
> org.apache.hadoop.mapred.YarnChild: Exception running child :
> org.apache.parquet.io.ParquetDecodingException: Can not read value at
> 1 in block 0 in file
>
> hdfs://localhost:9000/user/pireddu/seqal_mini_ref_bwamem_avo_output/tmp/part-m-00000.parquet
> at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
> at
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
> at
> it.crs4.pydoop.mapreduce.pipes.PydoopAvroBridgeReaderBase.initialize(PydoopAvroBridgeReaderBase.java:66)
> at
> it.crs4.pydoop.mapreduce.pipes.PydoopAvroBridgeValueReader.initialize(PydoopAvroBridgeValueReader.java:38)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ClassCastException:
> org.bdgenomics.formats.avro.Contig cannot be cast to java.lang.Integer
> at
> org.bdgenomics.formats.avro.AlignmentRecord.put(AlignmentRecord.java:258)
> at
> org.apache.parquet.avro.AvroIndexedRecordConverter.set(AvroIndexedRecordConverter.java:168)
> at
> org.apache.parquet.avro.AvroIndexedRecordConverter.access$000(AvroIndexedRecordConverter.java:46)
> at
> org.apache.parquet.avro.AvroIndexedRecordConverter$1.add(AvroIndexedRecordConverter.java:95)
> at
> org.apache.parquet.avro.AvroIndexedRecordConverter.end(AvroIndexedRecordConverter.java:189)
> at
> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:413)
> at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:218)
> ... 11 more
>
>
> I'm using parquet 1.8.1 and avro 1.7.6. I'm able to read the parquet
> file with parquet-tools-1.8.1, so I'm inclined to think that the file
> is valid.
>
> Contig is the first class defined in my avro schema:
>
> file schema:
> org.bdgenomics.formats.avro.AlignmentRecord
>
> --------------------------------------------------------------------------------
> contig:                               OPTIONAL F:6
> .contigName:                          OPTIONAL BINARY O:UTF8 R:0 D:2
> .contigLength:                        OPTIONAL INT64 R:0 D:2
> ...and so on.
>
> Can someone suggest what might be causing the problem when reading?
> Any help would be appreciated!
>
> Thanks,
>
> Luca
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: org.apache.parquet.io.ParquetDecodingException with AvroParquetInputFormat

Reply via email to