Re: [PR] WIP: Pytest benchmark proposal [sedona-db]

via GitHub Tue, 02 Sep 2025 09:17:36 -0700


paleolimbot commented on code in PR #10:
URL: https://github.com/apache/sedona-db/pull/10#discussion_r2316550987



##########
python/sedonadb/benchmarks/test_bench_base.py:
##########
@@ -0,0 +1,25 @@
+from sedonadb.testing import DuckDB, PostGIS, SedonaDB
+
+
+class TestBenchBase:
+    def setup_class(self):
+        self.sedonadb = SedonaDB.create_or_skip()
+        self.postgis = PostGIS.create_or_skip()
+        self.duckdb = DuckDB.create_or_skip()
+
+        # Setup tables
+        num_rows = 10000
+        create_points_query = f"CREATE TABLE points AS SELECT 
ST_GeomFromText('POINT(0 0)') AS geom FROM range({num_rows})"

Review Comment:
   The `DBEngine` subclass has this abstracted already such that you can create 
a table from a GeoParquet file or GeoPandas data frame. You can use the 
`geoarrow_data` fixture to write benchmarks against actual data, or you can use 
the `sd_random_geometry()` table function to generate it (Kristin's join 
integration tests are a great example).
   
   Probably synthetic data makes sense here: points, segments (linestrings with 
a vertex count of 2), polygon, complex_linestring, complex_polygon. The number 
of batches could be configurable so that you can run tiny benchmarks or big 
benchmarks (this is what we do in Rust, too).



##########
python/sedonadb/benchmarks/test_functions.py:
##########
@@ -0,0 +1,115 @@
+import pytest
+from test_bench_base import TestBenchBase
+from sedonadb.testing import DuckDB, geom_or_null, PostGIS, SedonaDB, 
val_or_null
+
+
+class TestBenchFunctions(TestBenchBase):
+    @pytest.mark.parametrize("eng", [SedonaDB, PostGIS, DuckDB])
+    def test_st_area(self, benchmark, eng):
+        eng = self._get_eng(eng)
+
+        def queries():
+            for geom in [
+                "POINT EMPTY",
+                "POINT(1 1)",
+                "LINESTRING(0 0, 1 1)",
+                "POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))",
+                "MULTIPOLYGON(((0 0, 1 0, 1 1, 0 1, 0 0), (0.5 0.5, 0.6 0.6, 
0.5 0.7, 0.4 0.6, 0.5 0.5)))",
+                "GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(0 0, 1 1), 
POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))",
+            ]:
+                eng.execute_and_collect(f"SELECT 
ST_Area({geom_or_null(geom)})")

Review Comment:
   Rather than unit tests, where we want to parameterize every single case, 
here we probably want to target specific queries. For example, for ST_Area(), 
we'd like to benchmark performance with against a simple polygon and a complex 
polygon and we're not so interested in the performance of the corner cases.



##########
python/sedonadb/benchmarks/test_bench_base.py:
##########


Review Comment:
   I think this should live in just `benchmarks/`, because running benchmarks 
is generally time-consuming and done less frequently than unit tests. This will 
require some repeating of the fixtures but you can still use the 
`sedonadb.testing` module.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] WIP: Pytest benchmark proposal [sedona-db]

Reply via email to