paleolimbot commented on PR #240:
URL: https://github.com/apache/parquet-format/pull/240#issuecomment-2118281448

   > If you rethink the design of GeoParquet, how can it do better if parquet 
format has some geospatial knowledge?
   
   The main reasons that the schema level metadata had to exist is because 
there was no way to put anything custom at the column level to give 
geometry-aware readers extra metadata about the column (CRS being the main one) 
and global column statistics (bbox). Bounding boxes at the feature level 
(worked around as a separate column) is the second somewhat ugly thing, which 
gives reasonable row group statistics for many things people might want to 
store. It seems like this PR would solve most of that.
   
   I am not sure that a new logical type will catch on to the extent that 
GeoParquet will, although I'm new to this community and I may be very wrong. 
The GeoParquet working group is enthusiastic and encodings/strategies for 
storing/querying geospatial datasets in a data lake context are evolving 
rapidly. Even though it is a tiny bit of a hack, using extra columns and 
schema-level metadata to encode these things is very flexible and lets 
implementations be built on top of a number of underlying readers/underlying 
versions of the Parquet format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to