wgtmac commented on PR #240: URL: https://github.com/apache/parquet-format/pull/240#issuecomment-2132509071
I think I have collected sufficient comments and suggestions to this proposal and have modified the it with following changes: 1. Removed explicit geo-specific metadata as much as possible and added an optional `metadata` field of string type. This makes it easy for Apache Iceberg adoption by simply enabling the new logical type without extra setup. GeoParquet can leverage the `metadata` field to offload the JSON style column metadata. 2. Added `GeometryStatistics` struct to store various statistics including bbox, H2/S3 covering and geometry types, etc. They can be put at page-level, row-group-level, and even file-level (which is in the logical type metadata). 3. I tried to add native encoding (same as GeoParquet/GeoArrow) as well. But at the moment parquet spec does not allow statistics on non-leaf column, meaning that it might be hacky to implement `GeometryStatistics` to native encoding types. I'm open to suggestion on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
