etseidl commented on PR #221: URL: https://github.com/apache/parquet-format/pull/221#issuecomment-2802072714
> So, how do we continue from here? Do we have enough implementations now to start a vote? Or what is still missing? Thanks @JFinis! I think what's needed now is to have the PoC engines each write a file with the new ordering (perhaps `alltypes_tiny_pages.parquet` from parquet-testing), and then have a query engine (one of Spark, Datafusion, DuckDB) do a query like 'select * ... where float_col is NaN' and show that the physical plan uses the new stats and results in no pages being read. (Perhaps a `NaN` could be introduced in one page). This would demonstrate interoperability and utility. After that, I'd probably kick off a thread on the dev list to see if there are any last concerns before holding a vote. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
