jiayuasu commented on code in PR #1162:
URL: https://github.com/apache/sedona/pull/1162#discussion_r1436790084


##########
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/GeoParquetMetaData.scala:
##########
@@ -14,12 +14,7 @@
 package org.apache.spark.sql.execution.datasources.parquet
 

Review Comment:
   Can we add a new data source to only read the file level metadata of a 
parquet file? This is crucial for entry-level users to explore an unknown 
parquet files including geoparquet. In our geoparquet case, this will help user 
know the projjson value since we are not able to properly parse it to a known 
epsg code.
   
   I understand that a Spark DataFrame only allows the schema as the metadata 
which cannot be used to hold such information.
   
   So I suggest that we add a new data source namely `geoparquet.metadata`, 
which loads these metadata using `ParquetFileReader`. One good example is from 
DuckDB: https://duckdb.org/docs/data/parquet/metadata.html
   
   This can be addressed in a separate PR.
   



##########
docs/tutorial/sql.md:
##########
@@ -656,6 +656,30 @@ Since v`1.3.0`, Sedona natively supports writing 
GeoParquet file. GeoParquet can
 df.write.format("geoparquet").save(geoparquetoutputlocation + 
"/GeoParquet_File_Name.parquet")
 ```
 
+Since v`1.5.1`, Sedona supports writing GeoParquet files with custom 
GeoParquet spec version and crs.

Review Comment:
   Please also add 
   
   `You can find the projjson string of a specific CRS from here: 
https://epsg.io/ (click the JSON option at the bottom of the page. You can also 
customize your projjson string as needed.`
   
   `Please note that Sedona currently cannot set/get a projjson string to/from 
a CRS. Its geoparquet reader will ignore the projjson metadata and you will 
have to set your CRS via ST_SetSRID after reading the file. Its geoparquet 
writer will not leverage the SRID field of a geometry so you will have to 
always set the geoparquet.crs option manually when writing the file.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to