paleolimbot commented on issue #225:
URL: https://github.com/apache/sedona-db/issues/225#issuecomment-3786761515

   Posting a workaround here from 
https://gis.stackexchange.com/questions/497917/efficient-spatial-join-with-overturemaps-geoparquet-in-sedonadb/499791#499791
 :
   
   ```python
   import os
   import sedona.db
   
   os.environ["AWS_SKIP_SIGNATURE"] = "true"
   os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
   
   sd = sedona.db.connect()
   
   sd.read_parquet(
       
"s3://overturemaps-us-west-2/release/2026-01-21.0/theme=buildings/type=building/"
   ).to_view("buildings")
   
   sd.sql("SELECT ST_SetSRID(ST_POINT(12, 23), 4326) AS 
geometry").to_view("points")
   
   # Workaround: resolve the portion of the very large remote table you need in 
advance
   # using .to_memtable(). You can't use a subquery here because that is 
optimized into a
   # spatial join.
   envelope = sd.sql("""SELECT ST_Envelope_Agg(geometry) FROM 
points""").to_pandas().iloc[0,0].wkt
   sd.sql(
       f"""
       SELECT * 
       FROM buildings 
       WHERE ST_Intersects(geometry, ST_GeomFromWKT('{envelope}', 4326))
       """
   ).to_memtable().to_view("buildings_filtered")
   
   # Next, do the join. This completes for me in ~45s.
   sd.sql("""
   SELECT points.*, buildings_filtered.id as building
   FROM points
   INNER JOIN buildings_filtered ON ST_Contains(buildings_filtered.geometry, 
points.geometry)
   """).show()
   #> ┌──────────┬──────────┐
   #> │ geometry ┆ building │
   #> │ geometry ┆   utf8   │
   #> ╞══════════╪══════════╡
   #> └──────────┴──────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to