alamb opened a new issue, #11280:
URL: https://github.com/apache/datafusion/issues/11280
@marvinlanhenke @alamb We always flatten the date page stats
iterator - following the pattern from the initial PR:
https://github.com/apache/datafusion/pull/10852/files#diff-7110f4709c105a18ef74a212396444d62052179a735d148fb62470a8b157fb40R582
But I'm wondering if flatten is the right thing to do here?
The min or max values for each page will be None if all the values on the
page happen to be null:
https://github.com/apache/arrow-rs/blob/master/parquet/src/file/page_index/index.rs#L37-L44
Using flatten in this case will mean that the length of result for that page
will be shorter than the number of data pages? So, is it possible that rather
than flatten we instead want to do something like a flat map where the Some
values are flattened and None values are mapped to a null value?
(It's entirely possible I'm misunderstanding something here, if so,
apologies in advance!)
_Originally posted by @efredine in
https://github.com/apache/datafusion/issues/10922#issuecomment-2209376864_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]