[jira] [Updated] (DRILL-2842) Parquet files with large file metadata sometimes fail to read in the FooterGather

Jason Altekruse (JIRA) Tue, 21 Apr 2015 18:05:47 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Altekruse updated DRILL-2842:
-----------------------------------
    Description: Parquet files with large footers could not be read. The length 
of the footer is written at the end of the file. To avoid excessive reads for 
smaller files, we read a reasonable amount of the end of the file that may 
contain the whole footer, with the actual exact length appearing at the end of 
the read. After checking the length we tried to read the remining portion ahead 
of what was already read and splice them together. The offset for where to put 
the bytes read first was off.  (was: The issue was a small mismatch in the read 
position, subtracting the wrong length from the size of the file to place the 
bytes from the original read operation.)

> Parquet files with large file metadata sometimes fail to read in the 
> FooterGather
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-2842
>                 URL: https://issues.apache.org/jira/browse/DRILL-2842
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>            Priority: Critical
>         Attachments: 2842.patch
>
>
> Parquet files with large footers could not be read. The length of the footer 
> is written at the end of the file. To avoid excessive reads for smaller 
> files, we read a reasonable amount of the end of the file that may contain 
> the whole footer, with the actual exact length appearing at the end of the 
> read. After checking the length we tried to read the remining portion ahead 
> of what was already read and splice them together. The offset for where to 
> put the bytes read first was off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2842) Parquet files with large file metadata sometimes fail to read in the FooterGather

Reply via email to