[ https://issues.apache.org/jira/browse/PARQUET-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153496#comment-15153496 ]
Deepak Majeti commented on PARQUET-531: --------------------------------------- This will be resolved by the upcoming patches for PARQUET-526 and PARQUET-532 > Can't read past first page in a column > -------------------------------------- > > Key: PARQUET-531 > URL: https://issues.apache.org/jira/browse/PARQUET-531 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Environment: Ubuntu Linux 14.04 (no obvious platform dependence), > Parquet file created by Apache Spark 1.5.0 on the same platform. > Reporter: Spiro Michaylov > Assignee: Deepak Majeti > Attachments: > part-r-00031-e5d9a4ef-d73e-406c-8c2f-9ad1f20ebf8e.gz.parquet > > > Building the code as of 2/14/2015 and adding the obvious three lines of code > to serialized-page.cc to enable the newly added CompressionCodec::GZIP: > {code} > case parquet::CompressionCodec::GZIP: > decompressor_.reset(new GZipCodec()); > break; > {code} > I try to run the parquet_reader example on the column I'm about to attach, > which was created by Apache Spark 1.5.0. It works surprisingly well until it > hits the end of the first page, where it dies with > {quote} > Parquet error: Value was non-null, but has not been buffered > {quote} > I realize you may be reluctant to look at this because (a) the GZip support > is new and (b) I had to modify the code to enable it, but actually things > seem to decompress just fine (congratulations: this is awesome!): looking at > the problem in the debugger and tracing through a bit it seems to me like the > buffering is a bit screwed up in general -- some kind of confusion between > the buffering at the Scanner and Reader levels. I can reproduce the problem > by reading through just a single column too. > It fails after 128 rows, which is suspicious given this line in > column/scanner.h: > {code} > DEFAULT_SCANNER_BATCH_SIZE = 128; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)