asfimport commented on issue #406: URL: https://github.com/apache/parquet-format/issues/406#issuecomment-2184154146
[Jan Finis](https://issues.apache.org/jira/browse/PARQUET-2249?#comment-17691108) / @jfinis: @wgtmac True, not writing a column index in this case is also a solution. Note though that this is a pessimization for pages not containing NaN in the same column chunk. It would be a shame if a single NaN makes a whole column chunk non-indexable. It might be a good interim solution, but it's not too satisfying. The whole topic of NaN handling in Parquet currently seems to be lacking and somewhat inconsistent, making columns with NaNs mostly unusable for scan pruning. Maybe there should be a redefinition of the semantics in a new version, so that columns with NaNs can be used for indexing as other columns. As mentioned, Iceberg has solved this problem by providing NaN counts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
