This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/sedona-spatialbench.git
The following commit(s) were added to refs/heads/main by this push:
new 8ca864a docs: add uncompressed data sizes of tables (#68)
8ca864a is described below
commit 8ca864a4a9c3a5d9682a14de20a5693c1e987139
Author: Matthew Powers <[email protected]>
AuthorDate: Mon Dec 15 12:37:08 2025 -0500
docs: add uncompressed data sizes of tables (#68)
* docs: add uncompressed data sizes of tables
* Apply suggestion from @jiayuasu
---------
Co-authored-by: Jia Yu <[email protected]>
---
docs/datasets-generators.md | 14 ++++++++++++++
docs/quickstart.md | 7 ++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/docs/datasets-generators.md b/docs/datasets-generators.md
index 5c8d976..5392ebe 100644
--- a/docs/datasets-generators.md
+++ b/docs/datasets-generators.md
@@ -116,3 +116,17 @@ Here are the contents of the `sf1-parquet` directory:
* `zone.parquet`
See [the README](https://github.com/apache/sedona-spatialbench) for a full
description of how to use the SpatialBench data generators.
+
+## Data sizes
+
+Here are the uncompressed Parquet file sizes of the tables for some different
scale factors:
+
+| Category | SF1 | SF10 | SF100 | SF1000 |
+|----------|------------|------------|------------|-------------|
+| Zone | 1.3 GB | 2.0 GB | 5.4 GB | 5.7 GB |
+| Trip | 471.1 MB| 5.0 GB | 50.4 GB | 512.7 GB |
+| Building | 2.4 MB | 10.2 MB | 18.0 MB | 0.03 GB |
+| Customer | 2.5 MB | 23.1 MB | 227.1 MB| 2.2 GB |
+| Driver | 0.04 MB | 0.4 MB | 4.0 MB | 0.03 GB |
+| Vehicle | 0.01 MB | 0.03 MB | 0.3 MB | 0.003 GB |
+| **Total**| **1.8 GB** | **7.0 GB** | **56.0 GB** | **520.6 GB** |
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 06d79e4..a4f75bb 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -38,11 +38,13 @@ spatialbench-cli --help
## Generate SF1 Data
To generate the full dataset at scale factor 1 in Parquet format:
+
```shell
spatialbench-cli --scale-factor 1
```
This creates six tables:
+
* trip
* customer
* driver
@@ -65,11 +67,13 @@ spatialbench-cli --scale-factor 1 --tables trip,building
### Partition Table Output into Multiple Files
Specify the number of partitions manually:
+
```shell
spatialbench-cli --scale-factor 10 --tables trip --parts 4
```
Or let the CLI determine the number of files using target size:
+
```shell
spatialbench-cli --scale-factor 10 --mb-per-file 512
```
@@ -85,6 +89,7 @@ spatialbench-cli --scale-factor 1 --output-dir data/sf1
SpatialBench uses a spatial data generator to generate synthetic points and
polygons using realistic spatial distributions.
To read more about the different spatial distributions offered by SpatialBench
see [here](https://sedona.apache.org/spatialbench/spatialbench-distributions/).
+
For more details about tuning the spatial distributions and the full YAML
schema and examples, see
[CONFIGURATION.md](https://github.com/apache/sedona-spatialbench/blob/main/spatialbench-cli/CONFIGURATION.md).
You can override these defaults at runtime by passing a YAML file via the
`--config` flag:
@@ -93,4 +98,4 @@ You can override these defaults at runtime by passing a YAML
file via the `--con
spatialbench-cli --scale-factor 1 --config spatialbench-config.yml
```
-If `--config` is not provided, SpatialBench checks for
./spatialbench-config.yml. If absent, it falls back to built-in defaults.
+If `--config` is not provided, SpatialBench checks for
`./spatialbench-config.yml`. If absent, it falls back to built-in defaults.