Hey Sedona Devs, I’m working on optimizing a spatial join (points and polygons) and I’m noticing quite a bit of data skew affecting the performance. I’ve attempted increasing the number of partitions with the parameter “sedona.join.numpartition” which has alleviated the symptoms a bit but has not done much to improve the skew. I’ve also tried modifying some of the other parameters on this page: https://sedona.apache.org/api/sql/Parameter/ with no luck. I was wondering what additional course of action you’d recommend to pursue? I’m using the SQL API not the RDD API.
Attached is a screen shot of the distribution of the task runtimes from the Spark History Server page. I’d be happy to provide any additional information you need. Thanks, Andrew Alex [Table Description automatically generated with medium confidence]
