[
https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260618#comment-15260618
]
Piyush Narang commented on PARQUET-400:
---------------------------------------
Thought I'd chime in as we ran into this issue as well when we tried to cut an
internal Parquet release with what's currently the latest sha at master. We are
running plain HDFS (not on S3) think based off version 2.6.0. Would be great if
the fix handles hdfs as well. Looking at the current PR
(https://github.com/apache/parquet-mr/pull/306/files) seems like the blacklist
is only s3 based.
> Error reading some files after PARQUET-77 bytebuffer read path
> --------------------------------------------------------------
>
> Key: PARQUET-400
> URL: https://issues.apache.org/jira/browse/PARQUET-400
> Project: Parquet
> Issue Type: Bug
> Reporter: Jason Altekruse
> Assignee: Jason Altekruse
> Attachments: bytebyffer_read_fail.gz.parquet
>
>
> This issue is based on a discussion on the list started by [~dweeks]
> Full discussion:
> https://mail-archives.apache.org/mod_mbox/parquet-dev/201512.mbox/%3CCAMpYv7C_szTheua9N95bXvbd2ROmV63BFiJTK-K-aDNK6ZNBKA%40mail.gmail.com%3E
> From the thread (he later provided a small repro file that is attached here):
> Just wanted to see if you or anyone else has run into problems reading
> files after the ByteBuffer patch. I've been running into issues and have
> narrowed it down to the ByteBuffer commit using a small repro file (written
> with 1.6.0, unfortunately can't share the data).
> It doesn't happen for every file, but those that fail give this error:
> can not read class org.apache.parquet.format.PageHeader: Required field
> 'uncompressed_page_size' was not found in serialized data! Struct:
> PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)