[jira] [Commented] (PARQUET-370) Nested records are not properly read if none of their fields are requested

Ryan Blue (JIRA) Fri, 20 Nov 2015 14:57:30 -0800

    [ 
https://issues.apache.org/jira/browse/PARQUET-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018994#comment-15018994
 ]


Ryan Blue commented on PARQUET-370:
-----------------------------------

[~julienledem], what do you think about getting this in for 1.9.0? It seems 
like a minor and could be put off, depending on how complicated it is to fix it.

> Nested records are not properly read if none of their fields are requested
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-370
>                 URL: https://issues.apache.org/jira/browse/PARQUET-370
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.5.0, 1.6.0, 1.7.0, 1.8.1
>            Reporter: Cheng Lian
>
> Say we have a Parquet file {{F}} with the following schema {{S1}}:
> {noformat}
> message root {
>   required group n {
>     optional int32 a;
>     optional int32 b;
>   }
> }
> {noformat}
> Later on, as the schema evolves, fields {{a}} and {{b}} are removed, while 
> {{c}} and {{d}} are added. Now we have schema {{S2}}:
> {noformat}
> message root {
>   required group n {
>     optional int32 c;
>     optional int32 d;
>   }
> }
> {noformat}
> {{S1}} and {{S2}} are compatible, so it should be OK to read {{F}} with 
> {{S2}} as requested schema.
> Say {{F}} contains a single record:
> {noformat}
> {"n": {"a": 1, "b": 2}}
> {noformat}
> When reading {{F}} with {{S2}}, expected output should be:
> {noformat}
> {"n": {"c": null, "d": null}}
> {noformat}
> But currently parquet-mr gives
> {noformat}
> {"n": null}
> {noformat}
> This is because {{MessageColumnIO}} finds that the physical Parquet file 
> contains no leaf columns defined in the requested schema, and shortcuts 
> record reading with an {{EmptyRecordReader}} for column {{n}}. See 
> [here|https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.1/parquet-column/src/main/java/org/apache/parquet/io/MessageColumnIO.java#L97-L99].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-370) Nested records are not properly read if none of their fields are requested

Reply via email to