paleolimbot commented on issue #225: URL: https://github.com/apache/sedona-db/issues/225#issuecomment-3786761515
Posting a workaround here from https://gis.stackexchange.com/questions/497917/efficient-spatial-join-with-overturemaps-geoparquet-in-sedonadb/499791#499791 : ```python import os import sedona.db os.environ["AWS_SKIP_SIGNATURE"] = "true" os.environ["AWS_DEFAULT_REGION"] = "us-west-2" sd = sedona.db.connect() sd.read_parquet( "s3://overturemaps-us-west-2/release/2026-01-21.0/theme=buildings/type=building/" ).to_view("buildings") sd.sql("SELECT ST_SetSRID(ST_POINT(12, 23), 4326) AS geometry").to_view("points") # Workaround: resolve the portion of the very large remote table you need in advance # using .to_memtable(). You can't use a subquery here because that is optimized into a # spatial join. envelope = sd.sql("""SELECT ST_Envelope_Agg(geometry) FROM points""").to_pandas().iloc[0,0].wkt sd.sql( f""" SELECT * FROM buildings WHERE ST_Intersects(geometry, ST_GeomFromWKT('{envelope}', 4326)) """ ).to_memtable().to_view("buildings_filtered") # Next, do the join. This completes for me in ~45s. sd.sql(""" SELECT points.*, buildings_filtered.id as building FROM points INNER JOIN buildings_filtered ON ST_Contains(buildings_filtered.geometry, points.geometry) """).show() #> ┌──────────┬──────────┐ #> │ geometry ┆ building │ #> │ geometry ┆ utf8 │ #> ╞══════════╪══════════╡ #> └──────────┴──────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
