[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

Jason Altekruse (JIRA) Tue, 08 Dec 2015 07:55:34 -0800

    [ 
https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046993#comment-15046993
 ]


Jason Altekruse commented on PARQUET-400:
-----------------------------------------

[~dweeks] I built parquet at the bytebuffer commit 
(6b605a4ea05b66e1a6bf843353abcb4834a4ced8) and tried reading this file using 
parquet-tools as well as Drill. Neither gave me the exception reading this 
file, are you reading the file off of local disk? We have seen some different 
behaviors reading files from different FileSystem implementations, as some of 
them choose to take advantage of the ability to return a subset of a requested 
read length. In a few cases in the Drill parquet reader we had neglected to 
handle this case and it showed up when we started querying files on S3. That 
being said, I don't even know if you can provide a remote path the the 
parquet-tools, so I assume you were just reading this off of local disk, in 
which case we'll have to do some more digging.

Let me know if you can think of any config that may be different on your 
machine or any other leads.



> Error reading some files after PARQUET-77 bytebuffer read path
> --------------------------------------------------------------
>
>                 Key: PARQUET-400
>                 URL: https://issues.apache.org/jira/browse/PARQUET-400
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>         Attachments: bytebyffer_read_fail.gz.parquet
>
>
> This issue is based on a discussion on the list started by [~dweeks]
> Full discussion:
> https://mail-archives.apache.org/mod_mbox/parquet-dev/201512.mbox/%3CCAMpYv7C_szTheua9N95bXvbd2ROmV63BFiJTK-K-aDNK6ZNBKA%40mail.gmail.com%3E
> From the thread (he later provided a small repro file that is attached here):
> Just wanted to see if you or anyone else has run into problems reading
> files after the ByteBuffer patch.  I've been running into issues and have
> narrowed it down to the ByteBuffer commit using a small repro file (written
> with 1.6.0, unfortunately can't share the data).
> It doesn't happen for every file, but those that fail give this error:
> can not read class org.apache.parquet.format.PageHeader: Required field
> 'uncompressed_page_size' was not found in serialized data! Struct:
> PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

Reply via email to