Re: [PR] PARQUET-2249: Introduce IEEE 754 total order [parquet-format]

via GitHub Mon, 14 Apr 2025 08:22:12 -0700


etseidl commented on PR #221:
URL: https://github.com/apache/parquet-format/pull/221#issuecomment-2802072714


   > So, how do we continue from here? Do we have enough implementations now to 
start a vote? Or what is still missing?
   
   Thanks @JFinis! I think what's needed now is to have the PoC engines each 
write a file with the new ordering (perhaps `alltypes_tiny_pages.parquet` from 
parquet-testing), and then have a query engine (one of Spark, Datafusion, 
DuckDB) do a query like 'select * ... where float_col is NaN' and show that the 
physical plan uses the new stats and results in no pages being read. (Perhaps a 
`NaN` could be introduced in one page). This would demonstrate interoperability 
and utility.
   
   After that, I'd probably kick off a thread on the dev list to see if there 
are any last concerns before holding a vote.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] PARQUET-2249: Introduce IEEE 754 total order [parquet-format]

Reply via email to