efredine commented on issue #11280: URL: https://github.com/apache/datafusion/issues/11280#issuecomment-2211168001
Ok - quick update - I had some misunderstanding of what was going on. I think there may still be a problem, but its different from what I originally thought. My misunderstanding: the top level flatten is because we have an iterator of iterators. The inner iterator is iterating over the page indexes within a ColumnIndex. That inner iterator returns an Option<T>. And the original PR added an explicit test case for the scenario of a data page with all null values: https://github.com/apache/datafusion/blob/main/datafusion/core/tests/parquet/arrow_statistics.rs#L475-L504 However, the tests for all the other data types don't cover this scenario and there are a bunch of places where we are doing a filter_map in the inner loop which should probably just be a map. But I need to write tests to prove this first and then expand the coverage if it turns out to be the case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
