[ https://issues.apache.org/jira/browse/PARQUET-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved PARQUET-1652. ----------------------------------- Fix Version/s: (was: cpp-1.6.0) Resolution: Not A Problem The problem ended up being with the record delimiting logic. I'm submitting a fix under ARROW-5630 > [C++] ColumnWriter writes incorrect "num_values" metadata for nested types > -------------------------------------------------------------------------- > > Key: PARQUET-1652 > URL: https://issues.apache.org/jira/browse/PARQUET-1652 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Reporter: Wes McKinney > Assignee: Wes McKinney > Priority: Major > > While investigating ARROW-5630, I discovered that we are writing incorrect > "num_values" metadata in {{DataPageHeader}} when writing nested types. > Instead of writing "Number of values, including NULLs, in this data page" as > the specification in parquet.thrift says, we are writing the number of > definition levels. For flat types, the number of definition levels and number > of values with nulls in the same, but for nested types the number of values > with nulls will generally be smaller. -- This message was sent by Atlassian Jira (v8.3.2#803003)