Sorry for keep sending email regarding Sedona, I am having an issue to add field list when convert spark dataframe to SpatialRDD.
When I checked python code this, as you can see it should take 3 params, but only take 2 params (dataframe, fieldNames), it doesn't hav geometryFieldName. Can you check this too? Thanks, [cid:[email protected]] From: Jia Yu <[email protected]> Sent: Friday, August 12, 2022 4:49 PM To: [email protected] Cc: Seo, Baewon <[email protected]> Subject: Re: Sedona question/feature - writing SpatialRDDs as single GeoJSON file [External] Hi Oisin, Spark / Sedona by design is not supposed to generate a single file from a RDD. But you can do this in Sedona by repartitioning. Use repartition(1) or coalesce (1) to make the resulting RDD only have 1 partition. Then call SaveAsGeoJSON. The resulting file will only have 1 single folder with a single file inside. Note that: (1) If your RDD is huge, repartitioning a RDD to 1 partition might crash the cluster since it puts all data in a single machine. (2) Use repartition(1) if possible,, because some users report coalesce(1) will lead to missing results. Thanks, Jia On Fri, Aug 12, 2022 at 12:46 PM Bates, Oisin <[email protected]<mailto:[email protected]>> wrote: Hi, I have been using Sedona lately and encountered a specific use case that I believe is not currently supported. Currently, we are using Python and saving writing our output to an Amazon S3 bucket via Sedona's saveAsGeoJSON()<https://sedona.apache.org/tutorial/core-python/#save-to-permanent-storage<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsedona.apache.org%2Ftutorial%2Fcore-python%2F%23save-to-permanent-storage&data=05%7C01%7CBaewon.Seo%40t-mobile.com%7Cdcde6a68c5204d721be408da7cacbb61%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637959378565932356%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SVbu2Iuep71e%2FuRDmkE3mMzqfNmRczjf3eUK3feld1g%3D&reserved=0>> function. The default here is to save a partitioned/distributed file. Is it realistic to consider the option to write the GeoJSON output as a single file, or am I overlooking something fundamental in Sedona Core? I was thinking that something similar to pyspark.sql.DataFrame.coalesce<https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.coalesce.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F3.1.1%2Fapi%2Fpython%2Freference%2Fapi%2Fpyspark.sql.DataFrame.coalesce.html&data=05%7C01%7CBaewon.Seo%40t-mobile.com%7Cdcde6a68c5204d721be408da7cacbb61%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637959378565932356%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yg1oDc4xtlrWjUxVDPw%2BsXxr6LCrr1AD3CC725RzjZ8%3D&reserved=0>> might be the most logical implementation? If my thoughts here seem reasonable, I'm happy to create a Jira ticket also. Appreciate your time and help on this. Best, OisÃn
