paleolimbot commented on code in PR #10: URL: https://github.com/apache/sedona-db/pull/10#discussion_r2316550987
########## python/sedonadb/benchmarks/test_bench_base.py: ########## @@ -0,0 +1,25 @@ +from sedonadb.testing import DuckDB, PostGIS, SedonaDB + + +class TestBenchBase: + def setup_class(self): + self.sedonadb = SedonaDB.create_or_skip() + self.postgis = PostGIS.create_or_skip() + self.duckdb = DuckDB.create_or_skip() + + # Setup tables + num_rows = 10000 + create_points_query = f"CREATE TABLE points AS SELECT ST_GeomFromText('POINT(0 0)') AS geom FROM range({num_rows})" Review Comment: The `DBEngine` subclass has this abstracted already such that you can create a table from a GeoParquet file or GeoPandas data frame. You can use the `geoarrow_data` fixture to write benchmarks against actual data, or you can use the `sd_random_geometry()` table function to generate it (Kristin's join integration tests are a great example). Probably synthetic data makes sense here: points, segments (linestrings with a vertex count of 2), polygon, complex_linestring, complex_polygon. The number of batches could be configurable so that you can run tiny benchmarks or big benchmarks (this is what we do in Rust, too). ########## python/sedonadb/benchmarks/test_functions.py: ########## @@ -0,0 +1,115 @@ +import pytest +from test_bench_base import TestBenchBase +from sedonadb.testing import DuckDB, geom_or_null, PostGIS, SedonaDB, val_or_null + + +class TestBenchFunctions(TestBenchBase): + @pytest.mark.parametrize("eng", [SedonaDB, PostGIS, DuckDB]) + def test_st_area(self, benchmark, eng): + eng = self._get_eng(eng) + + def queries(): + for geom in [ + "POINT EMPTY", + "POINT(1 1)", + "LINESTRING(0 0, 1 1)", + "POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))", + "MULTIPOLYGON(((0 0, 1 0, 1 1, 0 1, 0 0), (0.5 0.5, 0.6 0.6, 0.5 0.7, 0.4 0.6, 0.5 0.5)))", + "GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(0 0, 1 1), POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))", + ]: + eng.execute_and_collect(f"SELECT ST_Area({geom_or_null(geom)})") Review Comment: Rather than unit tests, where we want to parameterize every single case, here we probably want to target specific queries. For example, for ST_Area(), we'd like to benchmark performance with against a simple polygon and a complex polygon and we're not so interested in the performance of the corner cases. ########## python/sedonadb/benchmarks/test_bench_base.py: ########## Review Comment: I think this should live in just `benchmarks/`, because running benchmarks is generally time-consuming and done less frequently than unit tests. This will require some repeating of the fixtures but you can still use the `sedonadb.testing` module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org