robertnagy1 opened a new issue, #886:
URL: https://github.com/apache/sedona/issues/886

   Hi, 
   
   I see that in the PR [SEDONA-177] spatial predicates were implemented on the 
RDD level. I cannot somehow find this in the python libraries. Are they 
implemented for joins on SRDDs? 
   
   I am trying to find overlapping polygons within the same shapefile which has 
about 4 million features. What i would normally do is either:
   
   - Select t.* from table as t1 join table as t2 on st_overlaps(t1.geometry, 
t2.geometry) where t1.id<>t2.id  and then i would probably aggregate to find 
out how many geometries with different ID's are overlapping.
   - Or i would do a lateral join:  Select t1.id. c.counter from table as t1 
left join lateral(select count(*) as counter from table as t2 where 
st_overlaps(t1.geometry, t2.geometry) and (t1.id<>t2.id)) as c
   
   First query runs for 15 minutes, second query doesn't run at all cause it is 
correlated query, and it is not allowed in Spark.  So i wonder how much less 
time would it take to check the overlaps directly on SRDDs, rather than running 
SQL query on dataframes? 
   I find that this take particularly long time given that i run it on 8 
workers with 4 cores each.
   
   Do saving the files in a different format like delta lake, or geoparquet 
speed anything up in particular?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to