[GitHub] [sedona] umartin commented on pull request #809: [DOC] Document how to load GeoJSON using Spark SQL's built-in JSON da…

via GitHub Mon, 27 Mar 2023 06:28:00 -0700


umartin commented on PR #809:
URL: https://github.com/apache/sedona/pull/809#issuecomment-1485090752


   > BTW, is there a way that we could leverage Spark built-in JSON data source 
and ST_AsGeoJSON to save data back to GeoJSON file?
   
   That's a good question. I haven't tried it but in theory you could build the 
coordinate part with ST_AsGeoJSON and then build the rest of the json with the 
to_json function or the json data source. For this to work there needs to be 
some option in Spark to not quote the output from ST_AsGeoJSON.
   
   Processing GeoJSON in Spark SQL is a bit of a hack. GeoJSON is really the 
worst imaginable big data format. Since all rows (features) are wrapped up in 
an envelope object there is no way it can be written or read in parallel. 
Reading and writing that kind of format in Spark SQL means all data is pushed 
to a singe task.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] umartin commented on pull request #809: [DOC] Document how to load GeoJSON using Spark SQL's built-in JSON da…

Reply via email to