[jira] [Assigned] (PARQUET-505) Column reader: automatically handle large data pages

Deepak Majeti (JIRA) Mon, 08 Feb 2016 13:42:00 -0800

     [ 
https://issues.apache.org/jira/browse/PARQUET-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Deepak Majeti reassigned PARQUET-505:
-------------------------------------

    Assignee: Deepak Majeti

> Column reader: automatically handle large data pages
> ----------------------------------------------------
>
>                 Key: PARQUET-505
>                 URL: https://issues.apache.org/jira/browse/PARQUET-505
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Assignee: Deepak Majeti
>
> Currently, we are only supporting data pages whose headers are 64K or less 
> (see {{parquet/column/serialized-page.cc}}. Since page headers can 
> essentially be arbitrarily large (in pathological cases) because of the page 
> statistics, if deserializing the page header fails, we should attempt to read 
> a progressively larger amount of file data in effort to find the end of the 
> page header. 
> As part of this (and to make testing easier!), the maximum data page header 
> size should be configurable. We can write test cases by defining appropriate 
> Statistics structs to yield serialized page headers of whatever desired size.
> On malformed files, we may run past the end of the file, in such cases we 
> should raise a reasonable exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (PARQUET-505) Column reader: automatically handle large data pages

Reply via email to