robertnagy1 opened a new issue, #886: URL: https://github.com/apache/sedona/issues/886
Hi, I see that in the PR [SEDONA-177] spatial predicates were implemented on the RDD level. I cannot somehow find this in the python libraries. Are they implemented for joins on SRDDs? I am trying to find overlapping polygons within the same shapefile which has about 4 million features. What i would normally do is either: - Select t.* from table as t1 join table as t2 on st_overlaps(t1.geometry, t2.geometry) where t1.id<>t2.id and then i would probably aggregate to find out how many geometries with different ID's are overlapping. - Or i would do a lateral join: Select t1.id. c.counter from table as t1 left join lateral(select count(*) as counter from table as t2 where st_overlaps(t1.geometry, t2.geometry) and (t1.id<>t2.id)) as c First query runs for 15 minutes, second query doesn't run at all cause it is correlated query, and it is not allowed in Spark. So i wonder how much less time would it take to check the overlaps directly on SRDDs, rather than running SQL query on dataframes? I find that this take particularly long time given that i run it on 8 workers with 4 cores each. Do saving the files in a different format like delta lake, or geoparquet speed anything up in particular? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org