This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/sedona-spatialbench.git

commit 0188eb412832685d609a6c822815c22152408cfe
Author: Pranav Toggi <[email protected]>
AuthorDate: Mon Aug 25 14:06:37 2025 -0700

    [EWT-3178] Allow passing spider configs as yaml file at runtime (#7)
    
    * fix integration tests
    
    * expose spider configs as cli option
    
    * fix fmt
    
    * clippy fix
    
    * add default-config file
    
    * fix readme and file name
    
    * add default spatialbench-config.yml and document resolution order
---
 README.md                                  |  66 ++-----------
 SPIDER.md                                  | 124 +++++++++++++++++++++++
 spatialbench-cli/Cargo.toml                |   3 +
 spatialbench-cli/src/main.rs               |  47 +++++++++
 spatialbench-cli/src/spider_config_file.rs | 149 ++++++++++++++++++++++++++++
 spatialbench-config.yml                    |  29 ++++++
 spatialbench/Cargo.toml                    |   1 +
 spatialbench/src/generators.rs             |   8 +-
 spatialbench/src/lib.rs                    |   3 +-
 spatialbench/src/spider_defaults.rs        |  62 ++++++++++++
 spatialbench/src/spider_overrides.rs       |  28 ++++++
 spatialbench/src/spider_presets.rs         | 151 -----------------------------
 12 files changed, 457 insertions(+), 214 deletions(-)

diff --git a/README.md b/README.md
index e529af9..b07f7ff 100644
--- a/README.md
+++ b/README.md
@@ -87,69 +87,19 @@ for PART in $(seq 1 4); do
 done
 ```
 
-## SpatialBench Spider Data Generator
+#### Custom Spider Configuration
 
-SpatialBench includes a synthetic spatial data generator 
([spider.rs](https://github.com/wherobots/sedona-spatialbench/blob/main/spatialbench/src/spider.rs))
 for creating:
-- Points
-- Rectangles (boxes)
-- Polygons
+You can override these defaults at runtime by passing a YAML file via the 
`--config` flag:
 
-This generator is inspired by techniques from the paper [SpiderWeb: A Spatial 
Data Generator on the Web](https://dl.acm.org/doi/10.1145/3397536.3422351) by 
Katiyar et al., SIGSPATIAL 2020.
-
-### Supported Distribution Types
-
-| Type         | Description                                                   
|
-|--------------|---------------------------------------------------------------|
-| `UNIFORM`    | Uniformly distributed points in `[0,1]²`                      
|
-| `NORMAL`     | 2D Gaussian distribution with configurable `mu` and `sigma`   
|
-| `DIAGONAL`   | Points clustered along a diagonal                             
|
-| `BIT`        | Points in a grid with `2^digits` resolution                   
|
-| `SIERPINSKI` | Fractal pattern using Sierpinski triangle                     
|
-
-![image.png](images/spatial_distributions.png)
-
-## Configuring Spider Geometry Generation
-
-SpatialBench uses a flexible and extensible SpiderConfig struct (defined in 
Rust) to control how spatial geometries are generated for synthetic datasets. 
These configurations are defined in code, often using presets in 
spider_preset.rs.
-
-#### SpiderConfig Fields
-
-| Field | Type               | Description                                     
                               |
-|-------|--------------------|--------------------------------------------------------------------------------|
-| `dist_type` | `DistributionType` | Type of distribution to use (Uniform, 
Normal, Diagonal, Bit, Sierpinski, etc.) |
-| `geom_type` | `GeomType`         | Geometry to generate: Point, Box, or 
Polygon                                   |
-| `dim` | `i32`              | Number of dimensions (usually 2)                
                               |
-| `seed` | `u32`              | Random seed for reproducibility                
                                |
-| `affine` | `Option<[f64; 6]>` | Optional 2D affine transform (scale, rotate, 
shift)                            |
-| `width`, `height` | `f64`              | For `box` geometries, bounding box 
size                                        |
-| `maxseg` | `i32`              | Maximum number of segments for polygon 
shapes                                  |
-| `polysize` | `f64`              | Radius or size of the polygon              
                                    |
-| `params` | `DistributionParams` | Additional parameters based on 
distribution type                               |
-
-#### Supported DistributionParams Variants
-
-| Varient        | Field                  | Description                        
                                        |
-|----------------|------------------------|----------------------------------------------------------------------------|
-| `None`         | `--`                   | For distributions like Uniform or 
Sierpinski that don’t require parameters |
-| `Normal`       | `mu`, `sigma`          | Controls center and spread for 2D 
Gaussian                                 |
-| `Diagonal`     | `percentage`, `buffer` | Mix of diagonal-aligned points and 
noisy buffer                            |
-| `Bit`          | `probability`, `digits` | Recursive binary split with 
resolution control                             |
-
-#### Example: USA Mainland Mapping
-
-The affine transform maps generated coordinates from the local unit square 
[0,1]² into real-world extents. For example, the following affine matrix maps 
coordinates to the continental USA bounding box:
-
-```rust
-let affine = Some([
-    58.368269, 0.0, -125.244606,  // scale X to ~58°, offset to ~-125°
-    0.0, 25.175375, 24.006328     // scale Y to ~25°, offset to ~24°
-]);
+```bash
+spatialbench-cli -s 1 --format=parquet --tables trip,building --config 
spatialbench-config.yml
 ```
 
-This maps:
-- x = 0 → -125.24, x = 1 → -66.87
-- y = 0 → 24.00, y = 1 → 49.18
+If --config is not provided, SpatialBench checks for 
./spatialbench-config.yml. If absent, it falls back to built-in defaults.
+
+For reference, see the provided 
[spatialbench-config.yml](spatialbench-config.yml).
 
+See [SPIDER.md](SPIDER.md) for more details about spatial data generation and 
the full YAML schema and examples.
 
 ## Acknowledgements
 - [TPC-H](https://www.tpc.org/tpch/)
diff --git a/SPIDER.md b/SPIDER.md
new file mode 100644
index 0000000..8930bdd
--- /dev/null
+++ b/SPIDER.md
@@ -0,0 +1,124 @@
+# SpatialBench Spider Data Generator
+
+Spider module is SpatialBench’s built-in spatial geometry generator.
+It creates Points, Boxes, and Polygons using deterministic random 
distributions.
+
+Spider is designed for benchmark reproducibility:
+- Generates millions of geometries per second.
+- Uses seeds for deterministic output.
+- Supports affine transforms to map the unit square [0,1]² into real-world 
coordinates.
+
+Reference: [SpiderWeb: A Spatial Data Generator on the 
Web](https://dl.acm.org/doi/10.1145/3397536.3422351) by Katiyar et al., 
SIGSPATIAL 2020.
+
+## Supported Distribution Types
+
+| Type         | Description                                                   
|
+|--------------|---------------------------------------------------------------|
+| `UNIFORM`    | Uniformly distributed points in `[0,1]²`                      
|
+| `NORMAL`     | 2D Gaussian distribution with configurable `mu` and `sigma`   
|
+| `DIAGONAL`   | Points clustered along a diagonal                             
|
+| `BIT`        | Points in a grid with `2^digits` resolution                   
|
+| `SIERPINSKI` | Fractal pattern using Sierpinski triangle                     
|
+
+![image.png](images/spatial_distributions.png)
+
+## Using Spider in the CLI
+
+```bash
+spatialbench-cli -s 1 --tables trip,building --config spatialbench-config.yaml
+```
+
+If --config is omitted, SpatialBench will try a local default and then fall 
back to built-ins (see [Configuration Resolution & 
Logging](#configuration-resolution--logging)).
+
+## Expected Config File Structure
+
+At the top level, the YAML may define:
+
+```yaml
+trip:      # (optional) Config for Trip pickup points
+building:  # (optional) Config for Building polygons
+```
+
+Each entry must conform to the SpiderConfig schema:
+
+```yaml
+<name>:
+  dist_type: <string>        # uniform | normal | diagonal | bit | sierpinski 
| parcel
+  geom_type: <string>        # point | box | polygon
+  dim: <int>                 # usually 2
+  seed: <int>                # random seed for reproducibility
+  affine: [f64; 6]           # optional affine transform
+  width: <float>             # used if geom_type = box
+  height: <float>            # used if geom_type = box
+  maxseg: <int>              # polygon max segments
+  polysize: <float>          # polygon size or radius
+  params:                    # distribution-specific parameters
+    type: <string>           # one of: none, normal, diagonal, bit, parcel
+    ...                      # fields depend on type (see table below)
+```
+
+## Supported Distribution Parameters
+
+| Variant    | Field                  | Description                            
                                    |
+|------------|------------------------|----------------------------------------------------------------------------|
+| `None`     | `--`                   | For distributions like Uniform or 
Sierpinski that don’t require parameters |
+| `Normal`   | `mu`, `sigma`          | Controls center and spread for 2D 
Gaussian                                 |
+| `Diagonal` | `percentage`, `buffer` | Mix of diagonal-aligned points and 
noisy buffer                            |
+| `Bit`      | `probability`, `digits` | Recursive binary split with 
resolution control                             |
+
+## Default Configs
+
+The repository includes a ready-to-use default file:
+[`spatialbench-config.yml`](/spatialbench-config.yml).
+
+These defaults are automatically used if no `--config` is passed and the file 
exists in the current working directory.
+
+## Configuration Resolution & Logging
+
+When SpatialBench starts, it resolves configuration in this order:
+
+1. Explicit config: If --config <path> is provided, that file is used.
+2. Local default: If no flag is provided, SpatialBench looks for 
./spatialbench-config.yml in the current directory.
+3. Built-ins: If neither is found, it uses compiled defaults from 
spider_defaults.rs.
+
+## Affine Transform
+
+The affine transform maps coordinates from the unit square [0,1]² into 
real-world ranges.
+It is expressed as an array of 6 numbers:
+
+```
+[a, b, c, d, e, f]
+```
+
+Applied as:
+
+```
+X = a*x + b*y + c
+Y = d*x + e*y + f
+```
+
+- a, e → scale factors in X and Y.
+- b, d → shear/skew (usually 0 for simple scaling).
+- c, f → translation offsets.
+
+#### How to fill it
+
+1. Decide the bounding box of your target region:
+   - Example (continental USA): [-125.24, 24.00, -66.87, 49.18] → west, south, 
east, north.
+2. Compute scale and offset:
+   - scale_x = (east - west)
+   - scale_y = (north - south)
+   - offset_x = west
+   - offset_y = south
+3. Plug into [a, b, c, d, e, f] with no skew:
+   - [scale_x, 0.0, offset_x, 0.0, scale_y, offset_y]
+
+#### Example: Mapping [0,1]² to Continental USA
+
+```yaml
+affine: [58.368269, 0.0, -125.244606, 0.0, 25.175375, 24.006328]
+```
+
+Which means:
+- x=0 → -125.24, x=1 → -66.87
+- y=0 → 24.00, y=1 → 49.18
\ No newline at end of file
diff --git a/spatialbench-cli/Cargo.toml b/spatialbench-cli/Cargo.toml
index 55c63f3..92a6357 100644
--- a/spatialbench-cli/Cargo.toml
+++ b/spatialbench-cli/Cargo.toml
@@ -20,6 +20,9 @@ futures = "0.3.31"
 num_cpus = "1.0"
 log = "0.4.26"
 env_logger = "0.11.7"
+serde = { version = "1.0.219", features = ["derive"] }
+anyhow = "1.0.99"
+serde_yaml = "0.9.34+deprecated"
 
 [dev-dependencies]
 assert_cmd = "2.0"
diff --git a/spatialbench-cli/src/main.rs b/spatialbench-cli/src/main.rs
index 2b00979..02d0744 100644
--- a/spatialbench-cli/src/main.rs
+++ b/spatialbench-cli/src/main.rs
@@ -43,6 +43,7 @@ mod csv;
 mod generate;
 mod parquet;
 mod plan;
+mod spider_config_file;
 mod statistics;
 mod tbl;
 
@@ -50,6 +51,7 @@ use crate::csv::*;
 use crate::generate::{generate_in_chunks, Sink, Source};
 use crate::parquet::*;
 use crate::plan::GenerationPlan;
+use crate::spider_config_file::parse_yaml;
 use crate::statistics::WriteStatistics;
 use crate::tbl::*;
 use ::parquet::basic::Compression;
@@ -61,6 +63,7 @@ use spatialbench::generators::{
     BuildingGenerator, CustomerGenerator, DriverGenerator, TripGenerator, 
VehicleGenerator,
     ZoneGenerator,
 };
+use spatialbench::spider_overrides::{set_overrides, SpiderOverrides};
 use spatialbench::text::TextPool;
 use spatialbench_arrow::{
     BuildingArrow, CustomerArrow, DriverArrow, RecordBatchIterator, TripArrow, 
VehicleArrow,
@@ -90,6 +93,10 @@ struct Cli {
     #[arg(short = 'T', long = "tables", value_delimiter = ',', value_parser = 
TableValueParser)]
     tables: Option<Vec<Table>>,
 
+    /// YAML file path specifying configs for Trip and Building
+    #[arg(long = "config")]
+    config: Option<PathBuf>,
+
     /// Number of partitions to generate (manual parallel generation)
     #[arg(short, long)]
     parts: Option<i32>,
@@ -290,6 +297,46 @@ impl Cli {
             fs::create_dir_all(&self.output_dir)?;
         }
 
+        // Load overrides if provided or if default config file exists
+        let config_path = if let Some(path) = &self.config {
+            // Use explicitly provided config path
+            Some(path.clone())
+        } else {
+            // Look for default config file in current directory
+            let default_config = PathBuf::from("spatialbench-config.yml");
+            if default_config.exists() {
+                Some(default_config)
+            } else {
+                None
+            }
+        };
+
+        if let Some(path) = config_path {
+            let text = std::fs::read_to_string(&path).map_err(|e| {
+                io::Error::new(
+                    io::ErrorKind::InvalidInput,
+                    format!("Failed reading {}: {e}", path.display()),
+                )
+            })?;
+
+            match parse_yaml(&text) {
+                Ok(file_cfg) => {
+                    let trip = file_cfg.trip.as_ref().map(|c| 
c.to_generator());
+                    let building = file_cfg.building.as_ref().map(|c| 
c.to_generator());
+                    set_overrides(SpiderOverrides { trip, building });
+                    info!("Loaded spider configuration from {}", 
path.display());
+                }
+                Err(e) => {
+                    return Err(io::Error::new(
+                        io::ErrorKind::InvalidInput,
+                        format!("Failed parsing spider-config YAML: {e}"),
+                    ));
+                }
+            }
+        } else {
+            info!("Using default spider configuration from 
spider_defaults.rs");
+        }
+
         // Determine which tables to generate
         let tables: Vec<Table> = if let Some(tables) = self.tables.as_ref() {
             tables.clone()
diff --git a/spatialbench-cli/src/spider_config_file.rs 
b/spatialbench-cli/src/spider_config_file.rs
new file mode 100644
index 0000000..b6724ad
--- /dev/null
+++ b/spatialbench-cli/src/spider_config_file.rs
@@ -0,0 +1,149 @@
+use anyhow::Result;
+use serde::de::{self, Visitor};
+use serde::{Deserialize, Deserializer};
+use spatialbench::spider::{
+    DistributionParams, DistributionType, GeomType, SpiderConfig, 
SpiderGenerator,
+};
+use std::fmt;
+
+// Deserializer for DistributionType
+fn deserialize_distribution_type<'de, D>(deserializer: D) -> 
Result<DistributionType, D::Error>
+where
+    D: Deserializer<'de>,
+{
+    struct DistributionTypeVisitor;
+
+    impl Visitor<'_> for DistributionTypeVisitor {
+        type Value = DistributionType;
+
+        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
+            formatter.write_str("a string representing distribution type")
+        }
+
+        fn visit_str<E>(self, value: &str) -> Result<DistributionType, E>
+        where
+            E: de::Error,
+        {
+            match value.to_lowercase().as_str() {
+                "uniform" => Ok(DistributionType::Uniform),
+                "normal" => Ok(DistributionType::Normal),
+                "diagonal" => Ok(DistributionType::Diagonal),
+                "bit" => Ok(DistributionType::Bit),
+                "sierpinski" => Ok(DistributionType::Sierpinski),
+                _ => Err(E::custom(format!("unknown distribution type: {}", 
value))),
+            }
+        }
+    }
+
+    deserializer.deserialize_str(DistributionTypeVisitor)
+}
+
+// Deserializer for GeomType
+fn deserialize_geom_type<'de, D>(deserializer: D) -> Result<GeomType, D::Error>
+where
+    D: Deserializer<'de>,
+{
+    struct GeomTypeVisitor;
+
+    impl Visitor<'_> for GeomTypeVisitor {
+        type Value = GeomType;
+
+        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
+            formatter.write_str("a string representing geometry type")
+        }
+
+        fn visit_str<E>(self, value: &str) -> Result<GeomType, E>
+        where
+            E: de::Error,
+        {
+            match value.to_lowercase().as_str() {
+                "point" => Ok(GeomType::Point),
+                "box" => Ok(GeomType::Box),
+                "polygon" => Ok(GeomType::Polygon),
+                _ => Err(E::custom(format!("unknown geometry type: {}", 
value))),
+            }
+        }
+    }
+
+    deserializer.deserialize_str(GeomTypeVisitor)
+}
+
+#[derive(Deserialize)]
+pub struct SpiderConfigFile {
+    pub trip: Option<InlineSpiderConfig>,
+    pub building: Option<InlineSpiderConfig>,
+}
+
+#[derive(Deserialize)]
+pub struct InlineSpiderConfig {
+    #[serde(deserialize_with = "deserialize_distribution_type")]
+    pub dist_type: DistributionType,
+    #[serde(deserialize_with = "deserialize_geom_type")]
+    pub geom_type: GeomType,
+    pub dim: u8,
+    pub seed: u32,
+    pub affine: Option<[f64; 6]>,
+    // geometry = box
+    pub width: f64,
+    pub height: f64,
+    // geometry = polygon
+    pub maxseg: i32,
+    pub polysize: f64,
+    pub params: InlineParams,
+}
+
+#[derive(Deserialize)]
+#[serde(tag = "type", rename_all = "lowercase")]
+pub enum InlineParams {
+    None,
+    Normal { mu: f64, sigma: f64 },
+    Diagonal { percentage: f64, buffer: f64 },
+    Bit { probability: f64, digits: u32 },
+    Parcel { srange: f64, dither: f64 },
+}
+
+impl InlineSpiderConfig {
+    pub fn to_generator(&self) -> SpiderGenerator {
+        let params = match &self.params {
+            InlineParams::None => DistributionParams::None,
+            InlineParams::Normal { mu, sigma } => DistributionParams::Normal {
+                mu: *mu,
+                sigma: *sigma,
+            },
+            InlineParams::Diagonal { percentage, buffer } => 
DistributionParams::Diagonal {
+                percentage: *percentage,
+                buffer: *buffer,
+            },
+            InlineParams::Bit {
+                probability,
+                digits,
+            } => DistributionParams::Bit {
+                probability: *probability,
+                digits: *digits,
+            },
+            InlineParams::Parcel { srange, dither } => 
DistributionParams::Parcel {
+                srange: *srange,
+                dither: *dither,
+            },
+        };
+
+        let cfg = SpiderConfig {
+            dist_type: self.dist_type,
+            geom_type: self.geom_type,
+            dim: self.dim as i32,
+            seed: self.seed,
+            affine: self.affine,
+            width: self.width,
+            height: self.height,
+            maxseg: self.maxseg,
+            polysize: self.polysize,
+            params,
+        };
+        SpiderGenerator::new(cfg)
+    }
+}
+
+pub fn parse_yaml(text: &str) -> Result<SpiderConfigFile> {
+    log::info!("Default spider config is being overridden by user-provided 
configuration");
+    Ok(serde_yaml::from_str::<SpiderConfigFile>(text)?)
+}
diff --git a/spatialbench-config.yml b/spatialbench-config.yml
new file mode 100644
index 0000000..66a2fbd
--- /dev/null
+++ b/spatialbench-config.yml
@@ -0,0 +1,29 @@
+trip:
+  dist_type: bit
+  geom_type: point
+  dim: 2
+  seed: 42
+  affine: [360.0, 0.0, -180.0, 0.0, 180.0, -90.0]
+  width: 0.0
+  height: 0.0
+  maxseg: 0
+  polysize: 0.0
+  params:
+    type: bit
+    probability: 0.2
+    digits: 10
+
+building:
+  dist_type: bit
+  geom_type: box
+  dim: 2
+  seed: 12345
+  affine: [360.0, 0.0, -180.0, 0.0, 180.0, -90.0]
+  width: 0.00005
+  height: 0.0001
+  maxseg: 0
+  polysize: 0.0
+  params:
+    type: bit
+    probability: 0.5
+    digits: 20
\ No newline at end of file
diff --git a/spatialbench/Cargo.toml b/spatialbench/Cargo.toml
index 8eac353..07d3faf 100644
--- a/spatialbench/Cargo.toml
+++ b/spatialbench/Cargo.toml
@@ -16,6 +16,7 @@ rand = { version = "0.8", features = ["small_rng"] }
 duckdb = { version = "1.3.0", features = ["bundled"] }
 geo = { workspace = true }
 geozero = { workspace = true }
+once_cell = "1.21.3"
 log = "0.4.27"
 
 [dev-dependencies]
diff --git a/spatialbench/src/generators.rs b/spatialbench/src/generators.rs
index 8ab801c..043044a 100644
--- a/spatialbench/src/generators.rs
+++ b/spatialbench/src/generators.rs
@@ -10,7 +10,8 @@ use crate::random::{PhoneNumberInstance, RandomBoundedLong, 
StringSequenceInstan
 use crate::random::{RandomAlphaNumeric, RandomAlphaNumericInstance};
 use crate::random::{RandomBoundedInt, RandomString, RandomStringSequence, 
RandomText};
 use crate::spider::{spider_seed_for_index, SpiderGenerator};
-use crate::spider_presets::SpiderPresets;
+use crate::spider_defaults::SpiderDefaults;
+use crate::spider_overrides;
 use crate::text::TextPool;
 use duckdb::Connection;
 use geo::Geometry;
@@ -909,7 +910,7 @@ impl TripGenerator {
             Distributions::static_default(),
             TextPool::get_or_init_default(),
             crate::kde::default_distance_kde(),
-            SpiderPresets::for_trip_pickups4(),
+            spider_overrides::trip_or_default(SpiderDefaults::trip_default),
         )
     }
 
@@ -1247,14 +1248,13 @@ impl<'a> BuildingGenerator<'a> {
     /// Note the generator's lifetime is `&'static`. See [`BuildingGenerator`] 
for
     /// more details.
     pub fn new(scale_factor: f64, part: i32, part_count: i32) -> 
BuildingGenerator<'static> {
-        // Note: use explicit lifetime to ensure this remains `&'static`
         Self::new_with_distributions_and_text_pool(
             scale_factor,
             part,
             part_count,
             Distributions::static_default(),
             TextPool::get_or_init_default(),
-            SpiderPresets::for_building_polygons(),
+            
spider_overrides::building_or_default(SpiderDefaults::building_default),
         )
     }
 
diff --git a/spatialbench/src/lib.rs b/spatialbench/src/lib.rs
index 812a0af..47a61cf 100644
--- a/spatialbench/src/lib.rs
+++ b/spatialbench/src/lib.rs
@@ -60,5 +60,6 @@ pub mod kde;
 pub mod q_and_a;
 pub mod random;
 pub mod spider;
-pub mod spider_presets;
+pub mod spider_defaults;
+pub mod spider_overrides;
 pub mod text;
diff --git a/spatialbench/src/spider_defaults.rs 
b/spatialbench/src/spider_defaults.rs
new file mode 100644
index 0000000..043364c
--- /dev/null
+++ b/spatialbench/src/spider_defaults.rs
@@ -0,0 +1,62 @@
+use crate::spider::{
+    DistributionParams, DistributionType, GeomType, SpiderConfig, 
SpiderGenerator,
+};
+
+pub struct SpiderDefaults;
+
+impl SpiderDefaults {
+    const FULL_WORLD_AFFINE: [f64; 6] = [
+        360.0, // Scale X to cover full longitude range (-180° to 180°)
+        0.0, -180.0, // Offset X to start at -180° (west edge of map)
+        0.0, 180.0, // Scale Y to cover full latitude range (-90° to 90°)
+        -90.0, // Offset Y to start at -90° (south edge of map)
+    ];
+
+    pub fn trip_default() -> SpiderGenerator {
+        let config = SpiderConfig {
+            dist_type: DistributionType::Bit,
+            geom_type: GeomType::Point,
+            dim: 2,
+            seed: 42,
+            affine: Some(Self::FULL_WORLD_AFFINE),
+
+            // geometry = box
+            width: 0.0,
+            height: 0.0,
+
+            // geometry = polygon
+            maxseg: 0,
+            polysize: 0.0,
+
+            params: DistributionParams::Bit {
+                probability: 0.2,
+                digits: 10,
+            },
+        };
+        SpiderGenerator::new(config)
+    }
+
+    pub fn building_default() -> SpiderGenerator {
+        let config = SpiderConfig {
+            dist_type: DistributionType::Bit,
+            geom_type: GeomType::Box,
+            dim: 2,
+            seed: 12345,
+            affine: Some(Self::FULL_WORLD_AFFINE),
+
+            // geometry = box
+            width: 0.00005,
+            height: 0.0001,
+
+            // geometry = polygon
+            maxseg: 0,
+            polysize: 0.0,
+
+            params: DistributionParams::Bit {
+                probability: 0.5,
+                digits: 20,
+            },
+        };
+        SpiderGenerator::new(config)
+    }
+}
diff --git a/spatialbench/src/spider_overrides.rs 
b/spatialbench/src/spider_overrides.rs
new file mode 100644
index 0000000..830cad6
--- /dev/null
+++ b/spatialbench/src/spider_overrides.rs
@@ -0,0 +1,28 @@
+use crate::spider::SpiderGenerator;
+use once_cell::sync::OnceCell;
+
+#[derive(Clone, Default)]
+pub struct SpiderOverrides {
+    pub trip: Option<SpiderGenerator>,
+    pub building: Option<SpiderGenerator>,
+}
+
+static OVERRIDES: OnceCell<SpiderOverrides> = OnceCell::new();
+
+pub fn set_overrides(o: SpiderOverrides) {
+    let _ = OVERRIDES.set(o);
+}
+
+pub fn trip_or_default<F: FnOnce() -> SpiderGenerator>(fallback: F) -> 
SpiderGenerator {
+    OVERRIDES
+        .get()
+        .and_then(|o| o.trip.clone())
+        .unwrap_or_else(fallback)
+}
+
+pub fn building_or_default<F: FnOnce() -> SpiderGenerator>(fallback: F) -> 
SpiderGenerator {
+    OVERRIDES
+        .get()
+        .and_then(|o| o.building.clone())
+        .unwrap_or_else(fallback)
+}
diff --git a/spatialbench/src/spider_presets.rs 
b/spatialbench/src/spider_presets.rs
deleted file mode 100644
index 0ab4077..0000000
--- a/spatialbench/src/spider_presets.rs
+++ /dev/null
@@ -1,151 +0,0 @@
-use crate::spider::{
-    DistributionParams, DistributionType, GeomType, SpiderConfig, 
SpiderGenerator,
-};
-
-pub struct SpiderPresets;
-
-impl SpiderPresets {
-    const FULL_WORLD_AFFINE: [f64; 6] = [
-        360.0, // Scale X to cover full longitude range (-180° to 180°)
-        0.0, -180.0, // Offset X to start at -180° (west edge of map)
-        0.0, 180.0, // Scale Y to cover full latitude range (-90° to 90°)
-        -90.0, // Offset Y to start at -90° (south edge of map)
-    ];
-    pub fn for_trip_pickups() -> SpiderGenerator {
-        let config = SpiderConfig {
-            dist_type: DistributionType::Uniform,
-            geom_type: GeomType::Point,
-            dim: 2,
-            seed: 42,
-            affine: Some(Self::FULL_WORLD_AFFINE),
-
-            // geometry = box
-            width: 0.0,
-            height: 0.0,
-
-            // geometry = polygon
-            maxseg: 0,
-            polysize: 0.0,
-
-            params: DistributionParams::None,
-        };
-        SpiderGenerator::new(config)
-    }
-
-    pub fn for_trip_pickups2() -> SpiderGenerator {
-        let config = SpiderConfig {
-            dist_type: DistributionType::Diagonal,
-            geom_type: GeomType::Point,
-            dim: 2,
-            seed: 42,
-            affine: Some(Self::FULL_WORLD_AFFINE),
-
-            // geometry = box
-            width: 0.0,
-            height: 0.0,
-
-            // geometry = polygon
-            maxseg: 0,
-            polysize: 0.0,
-
-            params: DistributionParams::Diagonal {
-                percentage: 0.5,
-                buffer: 0.5,
-            },
-        };
-        SpiderGenerator::new(config)
-    }
-
-    pub fn for_trip_pickups3() -> SpiderGenerator {
-        let config = SpiderConfig {
-            dist_type: DistributionType::Sierpinski,
-            geom_type: GeomType::Point,
-            dim: 2,
-            seed: 42,
-            affine: Some(Self::FULL_WORLD_AFFINE),
-
-            // geometry = box
-            width: 0.0,
-            height: 0.0,
-
-            // geometry = polygon
-            maxseg: 0,
-            polysize: 0.0,
-
-            params: DistributionParams::None,
-        };
-        SpiderGenerator::new(config)
-    }
-
-    pub fn for_trip_pickups4() -> SpiderGenerator {
-        let config = SpiderConfig {
-            dist_type: DistributionType::Bit,
-            geom_type: GeomType::Point,
-            dim: 2,
-            seed: 42,
-            affine: Some(Self::FULL_WORLD_AFFINE),
-
-            // geometry = box
-            width: 0.0,
-            height: 0.0,
-
-            // geometry = polygon
-            maxseg: 0,
-            polysize: 0.0,
-
-            params: DistributionParams::Bit {
-                probability: 0.2,
-                digits: 10,
-            },
-        };
-        SpiderGenerator::new(config)
-    }
-
-    pub fn for_trip_pickups5() -> SpiderGenerator {
-        let config = SpiderConfig {
-            dist_type: DistributionType::Normal,
-            geom_type: GeomType::Point,
-            dim: 2,
-            seed: 42,
-            affine: Some(Self::FULL_WORLD_AFFINE),
-
-            // geometry = box
-            width: 0.0,
-            height: 0.0,
-
-            // geometry = polygon
-            maxseg: 0,
-            polysize: 0.0,
-
-            params: DistributionParams::Normal {
-                mu: 0.5,
-                sigma: 0.1,
-            },
-        };
-        SpiderGenerator::new(config)
-    }
-
-    pub fn for_building_polygons() -> SpiderGenerator {
-        let config = SpiderConfig {
-            dist_type: DistributionType::Bit,
-            geom_type: GeomType::Box,
-            dim: 2,
-            seed: 12345,
-            affine: Some(Self::FULL_WORLD_AFFINE),
-
-            // geometry = box
-            width: 0.00005,
-            height: 0.0001,
-
-            // geometry = polygon
-            maxseg: 0,
-            polysize: 0.0,
-
-            params: DistributionParams::Bit {
-                probability: 0.5,
-                digits: 20,
-            },
-        };
-        SpiderGenerator::new(config)
-    }
-}

Reply via email to