Re: org.apache.parquet.io.ParquetDecodingException with AvroParquetInputFormat

Luca Pireddu Mon, 07 Mar 2016 02:57:08 -0800

That's the tip I needed!

I had modified an existing schema while not removing the original,
thus creating two entities with the same name in the same namespace.
My apps somehow managed to use one version while writing and a
different one while reading, thus creating the problem.


Thanks for the help,

Luca



On 4 March 2016 at 18:42, Ryan Blue <[email protected]> wrote:
> Luca,
>
> What are you reader and writer schemas? It looks like they may not match
> because the reader expects and Integer but is deserializing a Contig object.
>
> rb
>
> On Fri, Mar 4, 2016 at 3:58 AM, Luca Pireddu <[email protected]> wrote:
>
>> Hello all,
>>
>> I'm using AvroParquetOutputFormat and AvroParquetInputFormat for a
>> pair of Hadoop applications -- one that writes avro-parquet and one
>> that reads.  Actually, I'm using Pydoop (
>> https://github.com/crs4/pydoop) but the actual I/O is done through the
>> AvroParquet classes.
>>
>> The writer seems to succeed.  Instead, the reader, when processing the
>> other application's result, crashes with a ParquetDecodingException.
>> Here's the syslog output with the stack trace:
>>
>>
>> 2016-03-04 12:46:50,075 INFO [main] org.apache.hadoop.mapred.MapTask:
>> Processing split: ParquetInputSplit{part:
>>
>> hdfs://localhost:9000/user/pireddu/seqal_mini_ref_bwamem_avo_output/tmp/part-m-00000.parquet
>> start: 0 end: 16916 length: 16916 hosts: []}
>> 2016-03-04 12:46:50,846 WARN [main]
>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>> org.apache.parquet.io.ParquetDecodingException: Can not read value at
>> 1 in block 0 in file
>>
>> hdfs://localhost:9000/user/pireddu/seqal_mini_ref_bwamem_avo_output/tmp/part-m-00000.parquet
>> at
>> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>> at
>> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>> at
>> it.crs4.pydoop.mapreduce.pipes.PydoopAvroBridgeReaderBase.initialize(PydoopAvroBridgeReaderBase.java:66)
>> at
>> it.crs4.pydoop.mapreduce.pipes.PydoopAvroBridgeValueReader.initialize(PydoopAvroBridgeValueReader.java:38)
>> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>> Caused by: java.lang.ClassCastException:
>> org.bdgenomics.formats.avro.Contig cannot be cast to java.lang.Integer
>> at
>> org.bdgenomics.formats.avro.AlignmentRecord.put(AlignmentRecord.java:258)
>> at
>> org.apache.parquet.avro.AvroIndexedRecordConverter.set(AvroIndexedRecordConverter.java:168)
>> at
>> org.apache.parquet.avro.AvroIndexedRecordConverter.access$000(AvroIndexedRecordConverter.java:46)
>> at
>> org.apache.parquet.avro.AvroIndexedRecordConverter$1.add(AvroIndexedRecordConverter.java:95)
>> at
>> org.apache.parquet.avro.AvroIndexedRecordConverter.end(AvroIndexedRecordConverter.java:189)
>> at
>> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:413)
>> at
>> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:218)
>> ... 11 more
>>
>>
>> I'm using parquet 1.8.1 and avro 1.7.6. I'm able to read the parquet
>> file with parquet-tools-1.8.1, so I'm inclined to think that the file
>> is valid.
>>
>> Contig is the first class defined in my avro schema:
>>
>> file schema:
>> org.bdgenomics.formats.avro.AlignmentRecord
>>
>> --------------------------------------------------------------------------------
>> contig:                               OPTIONAL F:6
>> .contigName:                          OPTIONAL BINARY O:UTF8 R:0 D:2
>> .contigLength:                        OPTIONAL INT64 R:0 D:2
>> ...and so on.
>>
>> Can someone suggest what might be causing the problem when reading?
>> Any help would be appreciated!
>>
>> Thanks,
>>
>> Luca
>>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

Re: org.apache.parquet.io.ParquetDecodingException with AvroParquetInputFormat

Reply via email to