jornfranke commented on issue #2586: URL: https://github.com/apache/iceberg/issues/2586#issuecomment-1707233081
I think it is a good start. I would like your opinion on the following: * From my point of view we need to support some spatial metadata - especially CRS ). I propose to reuse the same as in the geoparquet definition: https://geoparquet.org/releases/v1.0.0-rc.1/ * What do you propose as an underlying storage format? You mention three: geoparquet, spatialparquet and Geolake parquet and you implemented geoparquet, geoparquet (bbox), Geolake parquet. I propose to reduce this to one. At the moment it looks to me geoparquet has the largest community and support also in other systems (e.g. geopandas), which may make it easier to use in the Iceberg ecosystem (e.g. https://py.iceberg.apache.org/) Generally, I propose to go with a roadmap with a simple release first first to make it also easier for people from the Iceberg project to review and get initial feedback from the Iceberg community, e.g.: First release: Storage backend geoparquet (and also include geoparquet metadata). Supported Ecosystem: Apache Sedona - Spark Second release: Add XZ partitioning. Supported Ecosystem: Apache Sedona Spark and Flink and PyIceberg. Third release: Include raster data (here the challenge is to split a big raster into multiple tiles that are transparently read as one, cf. https://sedona.apache.org/1.4.1/tutorial/storing-blobs-in-parquet/)...? This is just an example, it can be changed in the detail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
