jiayuasu commented on code in PR #1162: URL: https://github.com/apache/sedona/pull/1162#discussion_r1436790084
########## spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/GeoParquetMetaData.scala: ########## @@ -14,12 +14,7 @@ package org.apache.spark.sql.execution.datasources.parquet Review Comment: Can we add a new data source to only read the metadata of a parquet file? This is crucial for entry-level users to explore an unknown parquet file including geoparquet. In our geoparquet case, this will help user know the projjson value since we are not able to properly parse it to a known epsg code. I understand that a Spark DataFrame only allows the schema as the metadata which cannot be used to hold such information. So I suggest that we add a new data source namely `geoparquet.metadata`, which loads these metadata using `ParquetFileReader`. One good example is from DuckDB: https://duckdb.org/docs/data/parquet/metadata.html This can be addressed in a separate PR. ########## spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/GeoParquetMetaData.scala: ########## @@ -14,12 +14,7 @@ package org.apache.spark.sql.execution.datasources.parquet Review Comment: Can we add a new data source to only read the metadata of a parquet file? This is crucial for entry-level users to explore an unknown parquet file including geoparquet. In our geoparquet case, this will help user know the projjson value since we are not able to properly parse it to a known epsg code. I understand that a Spark DataFrame only allows the schema to be the metadata which cannot be used to hold such information. So I suggest that we add a new data source namely `geoparquet.metadata`, which loads these metadata using `ParquetFileReader`. One good example is from DuckDB: https://duckdb.org/docs/data/parquet/metadata.html This can be addressed in a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
