Found the problem. It was a bug in my code (1st version). I was not getting the current definition level every time i call "consume()". I fixed that and now it works. Updated comments on the gist on github.
Thanks, Pratik On Fri, Aug 29, 2014 at 2:56 PM, pratik khadloya <[email protected]> wrote: > A similar issue was reported here > https://issues.apache.org/jira/browse/DRILL-827 > Not quite sure about the fix they made. > > > On Fri, Aug 29, 2014 at 8:09 AM, pratik khadloya <[email protected]> > wrote: > >> Hello, >> >> I have written the following two column readers for parquet, the first >> one opens a parquet file once and reads all columns and the second one >> re-opens the parquet file for every column it reads. >> >> With the first one, i get an exception while reading some columns. >> >> Exception in thread "main" parquet.io.ParquetDecodingException: Can't >> read value in column [description] BINARY at value 44899 out of 57096, >> 44899 out of 57096 in currentPage. repetition level: 0, definition level: 1 >> >> *1st:* https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf >> >> >> With the second one, i do not get any exception. But this way of reading >> the columns by re-opening the file for every column is not efficient. >> >> *2nd:* https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad >> >> Does anyone know whats going on here. I suspect a bug in the >> ParquetFileReader class where it is storing some state which it is not able >> to flush out completely. >> >> Any help is appreciated. >> >> Thanks, >> Pratik >> > >
