asfimport commented on issue #406:
URL: https://github.com/apache/parquet-format/issues/406#issuecomment-2184154146

   [Jan 
Finis](https://issues.apache.org/jira/browse/PARQUET-2249?#comment-17691108) / 
@jfinis:
   @wgtmac True, not writing a column index in this case is also a solution. 
Note though that this is a pessimization for pages not containing NaN in the 
same column chunk. It would be a shame if a single NaN makes a whole column 
chunk non-indexable. It might be a good interim solution, but it's not too 
satisfying.
   
   The whole topic of NaN handling in Parquet currently seems to be lacking and 
somewhat inconsistent, making columns with NaNs mostly unusable for scan 
pruning. Maybe there should be a redefinition of the semantics in a new 
version, so that columns with NaNs can be used for indexing as other columns. 
As mentioned, Iceberg has solved this problem by providing NaN counts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to