Re: [PR] WIP: Pytest benchmark proposal [sedona-db]

via GitHub Tue, 02 Sep 2025 14:38:01 -0700


petern48 commented on PR #10:
URL: https://github.com/apache/sedona-db/pull/10#issuecomment-3246881588


   I've changed it so that we generate columns of random geometries 
`points_10_000`, `polygons_10_000`, `polygons_100_000`, etc. I'm not sure how 
1. we should make this configurable or 2. to what extent we should make it 
configurable. 
   
   Like if we do different geometry types, simple / complex, and number of 
geometries, I feel that's a lot of dimensions. How much do we care to drill 
down?
   
   Looking at the current implementation of `test_st_area` (which is 
parametrized, unlike the rest). We can group by table (dataset size, etc) and 
compare the engines at a more granular level.
   (notice duckdb wins for one of the simpler datasets here, although sedonadb 
is faster for the rest and overall)
   `pytest --benchmark-group-by=param:table 
test_functions.py::TestBenchFunctions::test_st_area`
   <img width="1392" height="400" alt="image" 
src="https://github.com/user-attachments/assets/b1cd51a2-e889-4258-ba3f-576f53ce5ee2";
 />
   
   or we can can just benchmark them at the function level (e.g st_buffer)
   `pytest --benchmark-group-by=func 
test_functions.py::TestBenchFunctions::test_st_buffer`
   <img width="1260" height="103" alt="image" 
src="https://github.com/user-attachments/assets/ee825604-fc62-4ebe-987e-a2ed7e8ed1bc";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] WIP: Pytest benchmark proposal [sedona-db]

Reply via email to