This is an automated email from the ASF dual-hosted git repository. jiayu pushed a commit to branch update-readme in repository https://gitbox.apache.org/repos/asf/sedona-spatialbench.git
commit 0ae1e90408c304da090de8430b17c9f387bd65df Author: Jia Yu <[email protected]> AuthorDate: Wed Jan 14 20:29:32 2026 -0800 Update docs and readme --- README.md | 32 ++++++++++++++++++++++++++++++++ docs/index.md | 11 +++++++++++ docs/single-node-benchmarks.md | 36 +++++++++++++++++++++++++++++++++++- 3 files changed, 78 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 24680ea..2ffccc3 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,38 @@ You can print the queries in your dialect of choice using the following command: ./spatialbench-queries/print_queries.py <dialect> ``` +## Automated Benchmarks + +SpatialBench includes an automated benchmark framework that runs on GitHub Actions to verify that all queries are fully runnable across supported engines. + +> **Note:** The GitHub Actions benchmark is designed to validate correctness and runnability, not for serious performance comparisons. For meaningful performance benchmarks, please run SpatialBench on dedicated hardware with appropriate scale factors. See the [Single Node Benchmarks](https://sedona.apache.org/spatialbench/single-node-benchmarks/) page for detailed performance results. + +The automated tests cover: + +- 🦆 **DuckDB** - In-process analytical database with spatial extension +- 🐼 **GeoPandas** - Python geospatial data analysis library +- 🌵 **SedonaDB** - High-performance spatial analytics engine +- 🐻❄️ **Spatial Polars** - Geospatial extension for Polars dataframes + +### View Latest Results + +You can view the latest results on the [GitHub Actions page](../../actions/workflows/benchmark.yml). Click on any successful workflow run to see the summary with: + +- Query execution times for each engine +- Performance comparison across all 12 queries +- Winner highlighting for each query + +### Run Benchmarks Manually + +You can trigger a benchmark run manually from the [Actions tab](../../actions/workflows/benchmark.yml) with configurable options: + +- **Scale Factor**: 0.1, 1, or 10 +- **Engines**: Select which engines to benchmark +- **Query Timeout**: Adjust timeout for longer queries +- **Runs per Query**: 1, 3, or 5 runs for averaging + +The benchmark data is automatically downloaded from [Hugging Face](https://huggingface.co/datasets/apache-sedona/spatialbench) and cached for subsequent runs. + ## Data Model SpatialBench defines a spatial star schema with the following tables: diff --git a/docs/index.md b/docs/index.md index 55046a6..c33c007 100644 --- a/docs/index.md +++ b/docs/index.md @@ -86,6 +86,17 @@ ORDER BY nearby_pickup_count DESC; This query performs a distance join, followed by an aggregation. It's a great example of a query that's useful for performance benchmarking a spatial engine that can process vector geometries. +## Automated Testing + +SpatialBench includes an automated benchmark that runs on GitHub Actions to verify that all queries are fully runnable across supported engines (DuckDB, GeoPandas, SedonaDB, and Spatial Polars). + +**[View the latest test results →](https://github.com/apache/sedona-spatialbench/actions/workflows/benchmark.yml)** + +Click on any successful workflow run and scroll to the **Summary** section to see the results. + +!!! note + The GitHub Actions benchmark is designed to validate correctness and runnability, not for serious performance comparisons. For meaningful performance benchmarks, see the [Single Node Benchmarks](single-node-benchmarks.md) page. + ## Join the community Feel free to start a [GitHub Discussion](https://github.com/apache/sedona/discussions) or join the [Discord community](https://discord.gg/9A3k5dEBsY) to ask the developers any questions you may have. diff --git a/docs/single-node-benchmarks.md b/docs/single-node-benchmarks.md index 4337091..d1d6e37 100644 --- a/docs/single-node-benchmarks.md +++ b/docs/single-node-benchmarks.md @@ -97,7 +97,41 @@ SedonaDB completes KNN joins at both SF 1 and SF 10, thanks to its native operat SedonaDB demonstrates balanced performance across all query types and scales effectively to SF 10. DuckDB excels at spatial filters and some geometric operations but faces challenges with complex joins and KNN queries. GeoPandas, while popular in the Python ecosystem, requires manual optimization and parallelization to handle larger datasets effectively. -## Benchmark code +## Automated Benchmarks (GitHub Actions) + +We run automated benchmarks on every pull request and periodically via GitHub Actions to verify that all SpatialBench queries are fully runnable across supported engines. + +!!! note "Not for Performance Comparison" + The GitHub Actions benchmark is designed to validate correctness and runnability, **not** for serious performance comparisons. GitHub Actions runners have variable performance characteristics and limited resources. For meaningful performance benchmarks, please run SpatialBench on dedicated hardware with appropriate scale factors as described in the sections above. + +### View Latest Results + +Visit the [GitHub Actions Benchmark Page](https://github.com/apache/sedona-spatialbench/actions/workflows/benchmark.yml) to see the latest results. Click on any successful workflow run and scroll to the **Summary** section to view: + +- Query execution status for each engine +- Comparison across all 12 queries +- Error and timeout information + +### Supported Engines + +The automated tests cover: + +- 🦆 **DuckDB** - In-process analytical database with spatial extension +- 🐼 **GeoPandas** - Python geospatial data analysis library +- 🌵 **SedonaDB** - High-performance spatial analytics engine +- 🐻❄️ **Spatial Polars** - Geospatial extension for Polars dataframes + +### Run Your Own Benchmark + +You can trigger the automated tests manually from the [Actions tab](https://github.com/apache/sedona-spatialbench/actions/workflows/benchmark.yml) with configurable options: + +- **Scale Factor**: 0.1, 1, or 10 +- **Engines**: Select which engines to test +- **Query Timeout**: Adjust timeout for longer queries (default: 60s) +- **Runs per Query**: 1, 3, or 5 runs for averaging (default: 3) +- **Package Versions**: Pin specific versions or use latest + +## Benchmark Code You can access and run the benchmark code in the [sedona-spatialbench GitHub](https://github.com/apache/sedona-spatialbench) repository.
