alamb opened a new issue, #158: URL: https://github.com/apache/parquet-site/issues/158
This ticket tries to capture the disucsion with @steveloughran, @csringhofer, myself and others on https://github.com/apache/parquet-site/pull/156#pullrequestreview-3772364529 > It's been pointed out to me that the coverage matrix doesn't cover statistics/geometry bounding, without which predicate pushdown doesn't work: every rowgroup with the column needs scanning. The core point as I understand it is that there are several features that must be implemented in software libraries to realize the full benefits of the new Geometry and Geography types in Parquet. Specifically mentioned were - Logical type annotation (to know what columns hold Geometry and Geography types) <-- this is what the page currently reflects - Statistics implementation (e.g. the bounding boxes, and potentially different algorithms to compute them) - Query engine implementation (e.g. using the bounding box statistics to prune parquet files at query time) There are probably more ## Suggestions One the idea is to add more specific detail on https://parquet.apache.org/docs/file-format/implementationstatus/ . <img width="938" height="81" alt="Image" src="https://github.com/user-attachments/assets/947066eb-ed56-4e89-8a81-e30e24989d32" /> Perhaps it would be appropriate to add a specific line for the geography/geometry statistics, for example In addition to making the current implementation status more clear, red X's on the page seems to have the effect of pressuring additional ecosystem adoption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
