Re: Issue with parquet file reader

pratik khadloya Fri, 29 Aug 2014 16:48:58 -0700

Found the problem. It was a bug in my code (1st version). I was not getting
the current definition level every time i call "consume()".
I fixed that and now it works. Updated comments on the gist on github.


Thanks,
Pratik


On Fri, Aug 29, 2014 at 2:56 PM, pratik khadloya <[email protected]>
wrote:

> A similar issue was reported here
> https://issues.apache.org/jira/browse/DRILL-827
> Not quite sure about the fix they made.
>
>
> On Fri, Aug 29, 2014 at 8:09 AM, pratik khadloya <[email protected]>
> wrote:
>
>> Hello,
>>
>> I have written the following two column readers for parquet, the first
>> one opens a parquet file once and reads all columns and the second one
>> re-opens the parquet file for every column it reads.
>>
>> With the first one, i get an exception while reading some columns.
>>
>> Exception in thread "main" parquet.io.ParquetDecodingException: Can't
>> read value in column [description] BINARY at value 44899 out of 57096,
>> 44899 out of 57096 in currentPage. repetition level: 0, definition level: 1
>>
>> *1st:* https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf
>>
>>
>> With the second one, i do not get any exception. But this way of reading
>> the columns by re-opening the file for every column is not efficient.
>>
>> *2nd:* https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad
>>
>> Does anyone know whats going on here. I suspect a bug in the
>> ParquetFileReader class where it is storing some state which it is not able
>> to flush out completely.
>>
>> Any help is appreciated.
>>
>> Thanks,
>> Pratik
>>
>
>

Re: Issue with parquet file reader

Reply via email to