Data Skew in Spatial Join

Andrew Alex Mon, 20 Dec 2021 17:17:19 -0800

Hey Sedona Devs,

I’m working on optimizing a spatial join (points and polygons) and I’m noticing 
quite a bit of data skew affecting the performance. I’ve attempted increasing 
the number of partitions with the parameter “sedona.join.numpartition” which 
has alleviated the symptoms a bit but has not done much to improve the skew.  
I’ve also tried modifying some of the other parameters on this page: 
https://sedona.apache.org/api/sql/Parameter/ with no luck.  I was wondering 
what additional course of action you’d recommend to pursue?  I’m using the SQL 
API not the RDD API.


Attached is a screen shot of the distribution of the task runtimes from the 
Spark History Server page. I’d be happy to provide any additional information 
you need.

Thanks,

Andrew Alex


[Table  Description automatically generated with medium confidence]

Data Skew in Spatial Join

Reply via email to