[
https://issues.apache.org/jira/browse/PARQUET-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney resolved PARQUET-531.
----------------------------------
Resolution: Fixed
This was fixed in https://github.com/apache/parquet-cpp/pull/62. I verified
that invoking {{parquet_reader}} on the attached file now prints the contents
without failing. Thank you!
> Can't read past first page in a column
> --------------------------------------
>
> Key: PARQUET-531
> URL: https://issues.apache.org/jira/browse/PARQUET-531
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Environment: Ubuntu Linux 14.04 (no obvious platform dependence),
> Parquet file created by Apache Spark 1.5.0 on the same platform.
> Reporter: Spiro Michaylov
> Assignee: Deepak Majeti
> Attachments:
> part-r-00031-e5d9a4ef-d73e-406c-8c2f-9ad1f20ebf8e.gz.parquet
>
>
> Building the code as of 2/14/2015 and adding the obvious three lines of code
> to serialized-page.cc to enable the newly added CompressionCodec::GZIP:
> {code}
> case parquet::CompressionCodec::GZIP:
> decompressor_.reset(new GZipCodec());
> break;
> {code}
> I try to run the parquet_reader example on the column I'm about to attach,
> which was created by Apache Spark 1.5.0. It works surprisingly well until it
> hits the end of the first page, where it dies with
> {quote}
> Parquet error: Value was non-null, but has not been buffered
> {quote}
> I realize you may be reluctant to look at this because (a) the GZip support
> is new and (b) I had to modify the code to enable it, but actually things
> seem to decompress just fine (congratulations: this is awesome!): looking at
> the problem in the debugger and tracing through a bit it seems to me like the
> buffering is a bit screwed up in general -- some kind of confusion between
> the buffering at the Scanner and Reader levels. I can reproduce the problem
> by reading through just a single column too.
> It fails after 128 rows, which is suspicious given this line in
> column/scanner.h:
> {code}
> DEFAULT_SCANNER_BATCH_SIZE = 128;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)