alamb commented on PR #156:
URL: https://github.com/apache/parquet-site/pull/156#issuecomment-3872308313

   > It's been pointed out to me that the coverage matrix doesn't cover 
statistics/geometry bounding, without which predicate pushdown doesn't work: 
every rowgroup with the column needs scanning.
   
   
    > "Geospatial support in Parquet is still ongoing; as of February 2026 
columns statistics collection is incomplete, which means that scanning some 
types may require reading all the data. Furthermore the query engines 
themselves need to adopt the new format extensions."
   
   Maybe a more accurate summary is that the column statistics collection is 
not yet fully integrated into all engines. 
   
   FWIW the Rust Parquet implementation does handle such statistics (thanks to 
@kylebarron and @paleolimbot  as I recall) -- 
https://docs.rs/parquet/latest/parquet/format/struct.GeospatialStatistics.html, 
and I think SedonaDB has already integrated it into their query engine as well. 
   
   Perhaps we can add a line to the 
https://parquet.apache.org/docs/file-format/implementationstatus/ page for 
these   (doing so seems to have the effect of pressuring additional ecosystem 
adoption)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to