[
https://issues.apache.org/jira/browse/PARQUET-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113509#comment-15113509
]
Wes McKinney commented on PARQUET-459:
--------------------------------------
Related to PARQUET-435. A main issue here is that there are typically more
repetition/definition levels than values. For both nested and flat data
iteration, it would probably be better to retrieve a batch of repetition /
definition levels and a batch of values, and leave the iteration / null
inference to the downstream application developer.
I think the idea with returning a default undefined value with the current API
is to look at the definition level to determine whether the returned value is
actually null or not. At minimum we need to have some unit tests for the
current API.
I'm going to implement PARQUET-435 and PARQUET-451 together and will let you
know
> Improve handling of null values
> -------------------------------
>
> Key: PARQUET-459
> URL: https://issues.apache.org/jira/browse/PARQUET-459
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: Deepak Majeti
>
> Currently, the default value of the type is returned for NULL values and is
> incorrect.
> This JIRA will correctly identify a NULL value with the help of an additional
> variable that will be set for NULL values.
> This feature depends on reading the repetition level (PARQUET-169).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)