This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/sedona-spatialbench.git


The following commit(s) were added to refs/heads/main by this push:
     new 8ca864a  docs: add uncompressed data sizes of tables (#68)
8ca864a is described below

commit 8ca864a4a9c3a5d9682a14de20a5693c1e987139
Author: Matthew Powers <[email protected]>
AuthorDate: Mon Dec 15 12:37:08 2025 -0500

    docs: add uncompressed data sizes of tables (#68)
    
    * docs: add uncompressed data sizes of tables
    
    * Apply suggestion from @jiayuasu
    
    ---------
    
    Co-authored-by: Jia Yu <[email protected]>
---
 docs/datasets-generators.md | 14 ++++++++++++++
 docs/quickstart.md          |  7 ++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/docs/datasets-generators.md b/docs/datasets-generators.md
index 5c8d976..5392ebe 100644
--- a/docs/datasets-generators.md
+++ b/docs/datasets-generators.md
@@ -116,3 +116,17 @@ Here are the contents of the `sf1-parquet` directory:
 * `zone.parquet`
 
 See [the README](https://github.com/apache/sedona-spatialbench) for a full 
description of how to use the SpatialBench data generators.
+
+## Data sizes
+
+Here are the uncompressed Parquet file sizes of the tables for some different 
scale factors:
+
+| Category | SF1        | SF10       | SF100      | SF1000      |
+|----------|------------|------------|------------|-------------|
+| Zone     | 1.3 GB  | 2.0 GB  | 5.4 GB  | 5.7 GB   |
+| Trip     | 471.1 MB| 5.0 GB  | 50.4 GB | 512.7 GB |
+| Building | 2.4 MB  | 10.2 MB | 18.0 MB | 0.03 GB   |
+| Customer | 2.5 MB  | 23.1 MB | 227.1 MB| 2.2 GB   |
+| Driver   | 0.04 MB  | 0.4 MB  | 4.0 MB  | 0.03 GB   |
+| Vehicle  | 0.01 MB  | 0.03 MB  | 0.3 MB  | 0.003 GB   |
+| **Total**| **1.8 GB** | **7.0 GB** | **56.0 GB** | **520.6 GB** |
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 06d79e4..a4f75bb 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -38,11 +38,13 @@ spatialbench-cli --help
 ## Generate SF1 Data
 
 To generate the full dataset at scale factor 1 in Parquet format:
+
 ```shell
 spatialbench-cli --scale-factor 1
 ```
 
 This creates six tables:
+
 * trip
 * customer
 * driver
@@ -65,11 +67,13 @@ spatialbench-cli --scale-factor 1 --tables trip,building
 ### Partition Table Output into Multiple Files
 
 Specify the number of partitions manually:
+
 ```shell
 spatialbench-cli --scale-factor 10 --tables trip --parts 4
 ```
 
 Or let the CLI determine the number of files using target size:
+
 ```shell
 spatialbench-cli --scale-factor 10 --mb-per-file 512
 ```
@@ -85,6 +89,7 @@ spatialbench-cli --scale-factor 1 --output-dir data/sf1
 SpatialBench uses a spatial data generator to generate synthetic points and 
polygons using realistic spatial distributions.
 
 To read more about the different spatial distributions offered by SpatialBench 
see [here](https://sedona.apache.org/spatialbench/spatialbench-distributions/).
+
 For more details about tuning the spatial distributions and the full YAML 
schema and examples, see 
[CONFIGURATION.md](https://github.com/apache/sedona-spatialbench/blob/main/spatialbench-cli/CONFIGURATION.md).
 
 You can override these defaults at runtime by passing a YAML file via the 
`--config` flag:
@@ -93,4 +98,4 @@ You can override these defaults at runtime by passing a YAML 
file via the `--con
 spatialbench-cli --scale-factor 1 --config spatialbench-config.yml
 ```
 
-If `--config` is not provided, SpatialBench checks for 
./spatialbench-config.yml. If absent, it falls back to built-in defaults.
+If `--config` is not provided, SpatialBench checks for 
`./spatialbench-config.yml`. If absent, it falls back to built-in defaults.

Reply via email to