paleolimbot commented on issue #617:
URL: https://github.com/apache/sedona-db/issues/617#issuecomment-3901979492

   You're correct that SedonaDB will push down `WHERE 
ST_Intersects(ST_GeomFromWKT('...', 4326))` automatically using bounding box 
statistics in the `bbox` column of a GeoParquet file; however, DataFusion 
doesn't support pruning on struct column fields so we can't either (it will be 
added to the forthcoming DataFusion 53). You can check if pruning happened by 
running `EXPLAIN ANALYZE (query)` and checking the bottom right column (usually 
I need to collect first for overture because the result is long enough that it 
causes display to be truncated).
   
   If the pruning did occur, I suspect that `WHERE 
ST_Intersects(ST_GeomFromWKT('...', 4326))` is slower because DuckDB does a 
better job caching remote files/metadata/file listings. You can check if the 
difference is because of this by using `SET enable_external_file_cache = 
false;` in DuckDB. There are some improvements in DataFusion 52 to help there 
and also https://github.com/apache/sedona-db/pull/294 , which at the time the 
PR was opened didn't help but perhaps does now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to