[
https://issues.apache.org/jira/browse/PARQUET-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018994#comment-15018994
]
Ryan Blue commented on PARQUET-370:
-----------------------------------
[~julienledem], what do you think about getting this in for 1.9.0? It seems
like a minor and could be put off, depending on how complicated it is to fix it.
> Nested records are not properly read if none of their fields are requested
> --------------------------------------------------------------------------
>
> Key: PARQUET-370
> URL: https://issues.apache.org/jira/browse/PARQUET-370
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.5.0, 1.6.0, 1.7.0, 1.8.1
> Reporter: Cheng Lian
>
> Say we have a Parquet file {{F}} with the following schema {{S1}}:
> {noformat}
> message root {
> required group n {
> optional int32 a;
> optional int32 b;
> }
> }
> {noformat}
> Later on, as the schema evolves, fields {{a}} and {{b}} are removed, while
> {{c}} and {{d}} are added. Now we have schema {{S2}}:
> {noformat}
> message root {
> required group n {
> optional int32 c;
> optional int32 d;
> }
> }
> {noformat}
> {{S1}} and {{S2}} are compatible, so it should be OK to read {{F}} with
> {{S2}} as requested schema.
> Say {{F}} contains a single record:
> {noformat}
> {"n": {"a": 1, "b": 2}}
> {noformat}
> When reading {{F}} with {{S2}}, expected output should be:
> {noformat}
> {"n": {"c": null, "d": null}}
> {noformat}
> But currently parquet-mr gives
> {noformat}
> {"n": null}
> {noformat}
> This is because {{MessageColumnIO}} finds that the physical Parquet file
> contains no leaf columns defined in the requested schema, and shortcuts
> record reading with an {{EmptyRecordReader}} for column {{n}}. See
> [here|https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.1/parquet-column/src/main/java/org/apache/parquet/io/MessageColumnIO.java#L97-L99].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)