petern48 commented on PR #10: URL: https://github.com/apache/sedona-db/pull/10#issuecomment-3246881588
I've changed it so that we generate columns of random geometries `points_10_000`, `polygons_10_000`, `polygons_100_000`, etc. I'm not sure how 1. we should make this configurable or 2. to what extent we should make it configurable. Like if we do different geometry types, simple / complex, and number of geometries, I feel that's a lot of dimensions. How much do we care to drill down? Looking at the current implementation of `test_st_area` (which is parametrized, unlike the rest). We can group by table (dataset size, etc) and compare the engines at a more granular level. (notice duckdb wins for one of the simpler datasets here, although sedonadb is faster for the rest and overall) `pytest --benchmark-group-by=param:table test_functions.py::TestBenchFunctions::test_st_area` <img width="1392" height="400" alt="image" src="https://github.com/user-attachments/assets/b1cd51a2-e889-4258-ba3f-576f53ce5ee2" /> or we can can just benchmark them at the function level (e.g st_buffer) `pytest --benchmark-group-by=func test_functions.py::TestBenchFunctions::test_st_buffer` <img width="1260" height="103" alt="image" src="https://github.com/user-attachments/assets/ee825604-fc62-4ebe-987e-a2ed7e8ed1bc" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org