(sedona-db) 03/04: docs: add memory management and spill configuration guide (#679)

paleolimbot Thu, 05 Mar 2026 12:53:06 -0800

This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch branch-0.3.0
in repository https://gitbox.apache.org/repos/asf/sedona-db.git


commit 31adbe10e55df8503a74867bd388abd76653aa5f
Author: Kristin Cowalcijk <[email protected]>
AuthorDate: Wed Mar 4 23:39:41 2026 +0800

    docs: add memory management and spill configuration guide (#679)
    
    Co-authored-by: Copilot <[email protected]>
---
 docs/memory-management.ipynb | 384 +++++++++++++++++++++++++++++++++++++++++++
 docs/memory-management.md    | 240 +++++++++++++++++++++++++++
 mkdocs.yml                   |   2 +
 3 files changed, 626 insertions(+)

diff --git a/docs/memory-management.ipynb b/docs/memory-management.ipynb
new file mode 100644
index 00000000..0c338d4c
--- /dev/null
+++ b/docs/memory-management.ipynb
@@ -0,0 +1,384 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a2e6cc7b",
+   "metadata": {},
+   "source": [
+    "<!---\n",
+    "  Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  or more contributor license agreements.  See the NOTICE file\n",
+    "  distributed with this work for additional information\n",
+    "  regarding copyright ownership.  The ASF licenses this file\n",
+    "  to you under the Apache License, Version 2.0 (the\n",
+    "  \"License\"); you may not use this file except in compliance\n",
+    "  with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "    http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    "  Unless required by applicable law or agreed to in writing,\n",
+    "  software distributed under the License is distributed on an\n",
+    "  \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  KIND, either express or implied.  See the License for the\n",
+    "  specific language governing permissions and limitations\n",
+    "  under the License.\n",
+    "-->\n",
+    "\n",
+    "# Memory Management and Spilling\n",
+    "\n",
+    "SedonaDB supports memory-limited execution with automatic spill-to-disk, 
allowing you to process datasets that are larger than available memory. When a 
memory limit is configured, operators that exceed their memory budget 
automatically spill intermediate data to temporary files on disk and read them 
back as needed."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3a30ea9",
+   "metadata": {},
+   "source": [
+    "## Configuring Memory Limits\n",
+    "\n",
+    "Set `memory_limit` on the context options to cap the total memory 
available for query execution. The limit accepts an integer (bytes) or a 
human-readable string such as `\"4gb\"`, `\"512m\"`, or `\"1.5g\"`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d1f99fcf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "sd.options.memory_limit = \"4gb\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fdc73aa",
+   "metadata": {},
+   "source": [
+    "Without a memory limit, SedonaDB uses an unbounded memory pool and 
operators can use as much memory as needed (until the process hits system 
limits). In this mode, operators typically won't spill to disk because there is 
no memory budget to enforce.\n",
+    "\n",
+    "> **Note:** All runtime options (`memory_limit`, `memory_pool_type`, 
`temp_dir`, `unspillable_reserve_ratio`) must be set before the internal 
context is initialized. The internal context is created on the first call to 
`sd.sql(...)` (including `SET` statements) or any read method (for example, 
`sd.read_parquet(...)`) -- not when you call `.execute()` on the returned 
DataFrame. Once the internal context is created, these runtime options become 
read-only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0f91c15",
+   "metadata": {},
+   "source": [
+    "## Memory Pool Types\n",
+    "\n",
+    "The `memory_pool_type` option controls how the memory budget is 
distributed among concurrent operators. Two pool types are available:\n",
+    "\n",
+    "- **`\"greedy\"`** -- Grants memory reservations on a 
first-come-first-served basis. This is the default when no pool type is 
specified. Simple, but can lead to memory reservation failures under pressure 
-- one consumer may exhaust the pool before others get a chance to reserve 
memory.\n",
+    "- **`\"fair\"` (recommended)** -- Distributes memory fairly among 
spillable consumers and reserves a fraction of the pool for unspillable 
consumers. More stable under memory pressure and significantly less likely to 
cause reservation failures, at the cost of slightly lower utilization of the 
total reserved memory.\n",
+    "\n",
+    "We recommend using `\"fair\"` whenever a memory limit is configured."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b1dff726",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "sd.options.memory_limit = \"4gb\"\n",
+    "sd.options.memory_pool_type = \"fair\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd4c0a76",
+   "metadata": {},
+   "source": [
+    "> **Note:** `memory_pool_type` only takes effect when `memory_limit` is 
set."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "969f4fe0",
+   "metadata": {},
+   "source": [
+    "### Unspillable reserve ratio\n",
+    "\n",
+    "When using the `\"fair\"` pool, the `unspillable_reserve_ratio` option 
controls the fraction of the memory pool reserved for unspillable consumers 
(operators that cannot spill their memory to disk). It accepts a float between 
`0.0` and `1.0` and defaults to `0.2` (20%) when not explicitly set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dc0718cf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "sd.options.memory_limit = \"8gb\"\n",
+    "sd.options.memory_pool_type = \"fair\"\n",
+    "sd.options.unspillable_reserve_ratio = 0.3  # reserve 30% for unspillable 
consumers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb1c9e39",
+   "metadata": {},
+   "source": [
+    "## Temporary Directory for Spill Files\n",
+    "\n",
+    "By default, DataFusion uses the system temporary directory for spill 
files. You can override this with `temp_dir` to control where spill data is 
written -- for example, to point to a larger or faster disk."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c8d7a5c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "sd.options.memory_limit = \"4gb\"\n",
+    "sd.options.memory_pool_type = \"fair\"\n",
+    "sd.options.temp_dir = \"/mnt/fast-ssd/sedona-spill\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d318b8f",
+   "metadata": {},
+   "source": [
+    "## Example: Spatial Join with Limited Memory\n",
+    "\n",
+    "This example performs a spatial join between Natural Earth cities 
(points) and Natural Earth countries (polygons) using `ST_Contains`. Spatial 
joins are one of the most common workloads that benefit from memory limits and 
spill-to-disk."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "1ed77d58",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌───────────────┬─────────────────────────────┐\n",
+      "│   city_name   ┆         country_name        │\n",
+      "│      utf8     ┆             utf8            │\n",
+      "╞═══════════════╪═════════════════════════════╡\n",
+      "│ Suva          ┆ Fiji                        │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Dodoma        ┆ United Republic of Tanzania │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Dar es Salaam ┆ United Republic of Tanzania │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Bir Lehlou    ┆ Western Sahara              │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Ottawa        ┆ Canada                      │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Vancouver     ┆ Canada                      │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Toronto       ┆ Canada                      │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ San Francisco ┆ United States of America    │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Denver        ┆ United States of America    │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Houston       ┆ United States of America    │\n",
+      "└───────────────┴─────────────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "\n",
+    "# Configure runtime options before any sd.sql(...) or sd.read_* call.\n",
+    "sd.options.memory_limit = \"4gb\"\n",
+    "sd.options.memory_pool_type = \"fair\"\n",
+    "sd.options.unspillable_reserve_ratio = 0.2\n",
+    "sd.options.temp_dir = \"/tmp/sedona-spill\"\n",
+    "\n",
+    "cities = sd.read_parquet(\n",
+    "    
\"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_geo.parquet\"\n";,
+    ")\n",
+    "cities.to_view(\"cities\", overwrite=True)\n",
+    "\n",
+    "countries = sd.read_parquet(\n",
+    "    
\"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries_geo.parquet\"\n";,
+    ")\n",
+    "countries.to_view(\"countries\", overwrite=True)\n",
+    "\n",
+    "sd.sql(\n",
+    "    \"\"\"\n",
+    "    SELECT\n",
+    "        cities.name city_name,\n",
+    "        countries.name country_name\n",
+    "    FROM cities\n",
+    "    JOIN countries\n",
+    "      ON ST_Contains(countries.geometry, cities.geometry)\n",
+    "    \"\"\"\n",
+    ").show(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b97c0e8b",
+   "metadata": {},
+   "source": [
+    "## Operators Supporting Memory Limits\n",
+    "\n",
+    "When a memory limit is configured, the following operators automatically 
spill intermediate data to disk when they exceed their memory budget.\n",
+    "\n",
+    "In practice, this means memory limits and spilling can apply to both 
SedonaDB's spatial operators and DataFusion's general-purpose operators used by 
common SQL constructs.\n",
+    "\n",
+    "**SedonaDB:**\n",
+    "\n",
+    "- **Spatial joins** -- Both the build-side (index construction, partition 
collection) and probe-side (stream repartitioning) of SedonaDB's spatial joins 
support memory-pressure-driven spilling.\n",
+    "\n",
+    "**DataFusion (physical operators):**\n",
+    "\n",
+    "This list is not exhaustive. Many other DataFusion physical operators and 
execution strategies may allocate memory through the same runtime memory pool 
and may spill to disk when memory limits are enforced.\n",
+    "\n",
+    "- **`ORDER BY` / sorted Top-K** (`SortExec`) -- External sort that spills 
sorted runs to disk when memory is exhausted, then merges them.\n",
+    "- **Hash joins** (`HashJoinExec`) -- Hash join does not support spilling 
yet. The query will fail with a memory reservation error if the hash table 
exceeds the memory limit.\n",
+    "- **Sort-merge joins** (`SortMergeJoinExec`) -- Sort-merge join that 
spills buffered batches to disk when the memory limit is exceeded.\n",
+    "- **`GROUP BY` aggregations** (`AggregateExec`) -- Grouped aggregation 
that spills intermediate aggregation state to sorted spill files when memory is 
exhausted."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c56b821f",
+   "metadata": {},
+   "source": [
+    "## Advanced DataFusion Configurations\n",
+    "\n",
+    "DataFusion provides additional execution configurations that affect spill 
behavior. These can be set via SQL `SET` statements after connecting.\n",
+    "\n",
+    "> **Note:** Calling `sd.sql(...)` initializes the internal context 
immediately (including `sd.sql(\"SET ...\")`) and freezes runtime options 
immediately. Configure `sd.options.*` runtime options (like `memory_limit` and 
`temp_dir`) before calling any `sd.sql(...)`, including `SET` statements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dd9f5b7",
+   "metadata": {},
+   "source": [
+    "### Spill compression\n",
+    "\n",
+    "By default, data is written to spill files uncompressed. Enabling 
compression reduces the amount of disk I/O and disk space used at the cost of 
additional CPU work. This is beneficial when disk I/O throughput is low or when 
disk space is not large enough to hold uncompressed spill data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10cf38cb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "sd.options.memory_limit = \"4gb\"\n",
+    "sd.options.memory_pool_type = \"fair\"\n",
+    "\n",
+    "# Enable LZ4 compression for spill files.\n",
+    "sd.sql(\"SET datafusion.execution.spill_compression = 
'lz4_frame'\").execute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a3ce16c",
+   "metadata": {},
+   "source": [
+    "### Maximum temporary directory size\n",
+    "\n",
+    "DataFusion limits the total size of temporary spill files to prevent 
unbounded disk usage. The default limit is **100 G**. If your workload needs to 
spill more data than this, increase the limit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eac829ad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db\n",
+    "\n",
+    "sd = sedona.db.connect()\n",
+    "sd.options.memory_limit = \"4gb\"\n",
+    "sd.options.memory_pool_type = \"fair\"\n",
+    "\n",
+    "# Increase the spill directory size limit to 500 GB.\n",
+    "sd.sql(\"SET datafusion.runtime.max_temp_directory_size = 
'500G'\").execute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd8c2c23",
+   "metadata": {},
+   "source": [
+    "## System Configuration\n",
+    "\n",
+    "### Maximum number of open files\n",
+    "\n",
+    "Large workloads that spill heavily can create a large number of temporary 
files. During a spatial join, each parallel execution thread may create one 
spill file per spatial partition. The total number of open spill files can 
therefore reach **parallelism x number of spatial partitions**. For example, on 
an 8-CPU host running a spatial join that produces 500 spatial partitions, up 
to **8 x 500 = 4,000** spill files may be open simultaneously -- far exceeding 
the default per-process f [...]
+    "\n",
+    "The operating system's per-process file descriptor limit must be high 
enough to accommodate this, otherwise queries will fail with \"too many open 
files\" errors.\n",
+    "\n",
+    "**Linux:**\n",
+    "\n",
+    "The default limit is typically 1024, which is easily exceeded by 
spill-heavy workloads like the example above.\n",
+    "\n",
+    "To raise the limit permanently, add the following to 
`/etc/security/limits.conf`:\n",
+    "\n",
+    "```\n",
+    "*    soft    nofile    65535\n",
+    "*    hard    nofile    65535\n",
+    "```\n",
+    "\n",
+    "Then log out and back in (or reboot) for the change to take effect. 
Verify with:\n",
+    "\n",
+    "```bash\n",
+    "ulimit -n\n",
+    "```\n",
+    "\n",
+    "**macOS:**\n",
+    "\n",
+    "```bash\n",
+    "ulimit -n 65535\n",
+    "```\n",
+    "\n",
+    "This affects the current shell session. Persistent/system-wide limits are 
OS and configuration dependent; consult your macOS configuration and 
documentation if you need to raise the hard limit."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "sedonadb",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/memory-management.md b/docs/memory-management.md
new file mode 100644
index 00000000..0318e807
--- /dev/null
+++ b/docs/memory-management.md
@@ -0,0 +1,240 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Memory Management and Spilling
+
+SedonaDB supports memory-limited execution with automatic spill-to-disk, 
allowing you to process datasets that are larger than available memory. When a 
memory limit is configured, operators that exceed their memory budget 
automatically spill intermediate data to temporary files on disk and read them 
back as needed.
+
+## Configuring Memory Limits
+
+Set `memory_limit` on the context options to cap the total memory available 
for query execution. The limit accepts an integer (bytes) or a human-readable 
string such as `"4gb"`, `"512m"`, or `"1.5g"`.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "4gb"
+```
+
+Without a memory limit, SedonaDB uses an unbounded memory pool and operators 
can use as much memory as needed (until the process hits system limits). In 
this mode, operators typically won't spill to disk because there is no memory 
budget to enforce.
+
+> **Note:** All runtime options (`memory_limit`, `memory_pool_type`, 
`temp_dir`, `unspillable_reserve_ratio`) must be set before the internal 
context is initialized. The internal context is created on the first call to 
`sd.sql(...)` (including `SET` statements) or any read method (for example, 
`sd.read_parquet(...)`) -- not when you call `.execute()` on the returned 
DataFrame. Once the internal context is created, these runtime options become 
read-only.
+
+## Memory Pool Types
+
+The `memory_pool_type` option controls how the memory budget is distributed 
among concurrent operators. Two pool types are available:
+
+- **`"greedy"`** -- Grants memory reservations on a first-come-first-served 
basis. This is the default when no pool type is specified. Simple, but can lead 
to memory reservation failures under pressure -- one consumer may exhaust the 
pool before others get a chance to reserve memory.
+- **`"fair"` (recommended)** -- Distributes memory fairly among spillable 
consumers and reserves a fraction of the pool for unspillable consumers. More 
stable under memory pressure and significantly less likely to cause reservation 
failures, at the cost of slightly lower utilization of the total reserved 
memory.
+
+We recommend using `"fair"` whenever a memory limit is configured.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "4gb"
+sd.options.memory_pool_type = "fair"
+```
+
+> **Note:** `memory_pool_type` only takes effect when `memory_limit` is set.
+
+### Unspillable reserve ratio
+
+When using the `"fair"` pool, the `unspillable_reserve_ratio` option controls 
the fraction of the memory pool reserved for unspillable consumers (operators 
that cannot spill their memory to disk). It accepts a float between `0.0` and 
`1.0` and defaults to `0.2` (20%) when not explicitly set.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "8gb"
+sd.options.memory_pool_type = "fair"
+sd.options.unspillable_reserve_ratio = 0.3  # reserve 30% for unspillable 
consumers
+```
+
+## Temporary Directory for Spill Files
+
+By default, DataFusion uses the system temporary directory for spill files. 
You can override this with `temp_dir` to control where spill data is written -- 
for example, to point to a larger or faster disk.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "4gb"
+sd.options.memory_pool_type = "fair"
+sd.options.temp_dir = "/mnt/fast-ssd/sedona-spill"
+```
+
+## Example: Spatial Join with Limited Memory
+
+This example performs a spatial join between Natural Earth cities (points) and 
Natural Earth countries (polygons) using `ST_Contains`. Spatial joins are one 
of the most common workloads that benefit from memory limits and spill-to-disk.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+
+# Configure runtime options before any sd.sql(...) or sd.read_* call.
+sd.options.memory_limit = "4gb"
+sd.options.memory_pool_type = "fair"
+sd.options.unspillable_reserve_ratio = 0.2
+sd.options.temp_dir = "/tmp/sedona-spill"
+
+cities = sd.read_parquet(
+    
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_geo.parquet";
+)
+cities.to_view("cities", overwrite=True)
+
+countries = sd.read_parquet(
+    
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries_geo.parquet";
+)
+countries.to_view("countries", overwrite=True)
+
+sd.sql(
+    """
+    SELECT
+        cities.name city_name,
+        countries.name country_name
+    FROM cities
+    JOIN countries
+      ON ST_Contains(countries.geometry, cities.geometry)
+    """
+).show(10)
+```
+
+    ┌───────────────┬─────────────────────────────┐
+    │   city_name   ┆         country_name        │
+    │      utf8     ┆             utf8            │
+    ╞═══════════════╪═════════════════════════════╡
+    │ Suva          ┆ Fiji                        │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Dodoma        ┆ United Republic of Tanzania │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Dar es Salaam ┆ United Republic of Tanzania │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Bir Lehlou    ┆ Western Sahara              │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Ottawa        ┆ Canada                      │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Vancouver     ┆ Canada                      │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Toronto       ┆ Canada                      │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ San Francisco ┆ United States of America    │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Denver        ┆ United States of America    │
+    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ Houston       ┆ United States of America    │
+    └───────────────┴─────────────────────────────┘
+
+
+## Operators Supporting Memory Limits
+
+When a memory limit is configured, the following operators automatically spill 
intermediate data to disk when they exceed their memory budget.
+
+In practice, this means memory limits and spilling can apply to both 
SedonaDB's spatial operators and DataFusion's general-purpose operators used by 
common SQL constructs.
+
+**SedonaDB:**
+
+- **Spatial joins** -- Both the build-side (index construction, partition 
collection) and probe-side (stream repartitioning) of SedonaDB's spatial joins 
support memory-pressure-driven spilling.
+
+**DataFusion (physical operators):**
+
+This list is not exhaustive. Many other DataFusion physical operators and 
execution strategies may allocate memory through the same runtime memory pool 
and may spill to disk when memory limits are enforced.
+
+- **`ORDER BY` / sorted Top-K** (`SortExec`) -- External sort that spills 
sorted runs to disk when memory is exhausted, then merges them.
+- **Hash joins** (`HashJoinExec`) -- Hash join does not support spilling yet. 
The query will fail with a memory reservation error if the hash table exceeds 
the memory limit.
+- **Sort-merge joins** (`SortMergeJoinExec`) -- Sort-merge join that spills 
buffered batches to disk when the memory limit is exceeded.
+- **`GROUP BY` aggregations** (`AggregateExec`) -- Grouped aggregation that 
spills intermediate aggregation state to sorted spill files when memory is 
exhausted.
+
+## Advanced DataFusion Configurations
+
+DataFusion provides additional execution configurations that affect spill 
behavior. These can be set via SQL `SET` statements after connecting.
+
+> **Note:** Calling `sd.sql(...)` initializes the internal context immediately 
(including `sd.sql("SET ...")`) and freezes runtime options immediately. 
Configure `sd.options.*` runtime options (like `memory_limit` and `temp_dir`) 
before calling any `sd.sql(...)`, including `SET` statements.
+
+### Spill compression
+
+By default, data is written to spill files uncompressed. Enabling compression 
reduces the amount of disk I/O and disk space used at the cost of additional 
CPU work. This is beneficial when disk I/O throughput is low or when disk space 
is not large enough to hold uncompressed spill data.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "4gb"
+sd.options.memory_pool_type = "fair"
+
+# Enable LZ4 compression for spill files.
+sd.sql("SET datafusion.execution.spill_compression = 'lz4_frame'").execute()
+```
+
+### Maximum temporary directory size
+
+DataFusion limits the total size of temporary spill files to prevent unbounded 
disk usage. The default limit is **100 G**. If your workload needs to spill 
more data than this, increase the limit.
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "4gb"
+sd.options.memory_pool_type = "fair"
+
+# Increase the spill directory size limit to 500 GB.
+sd.sql("SET datafusion.runtime.max_temp_directory_size = '500G'").execute()
+```
+
+## System Configuration
+
+### Maximum number of open files
+
+Large workloads that spill heavily can create a large number of temporary 
files. During a spatial join, each parallel execution thread may create one 
spill file per spatial partition. The total number of open spill files can 
therefore reach **parallelism x number of spatial partitions**. For example, on 
an 8-CPU host running a spatial join that produces 500 spatial partitions, up 
to **8 x 500 = 4,000** spill files may be open simultaneously -- far exceeding 
the default per-process file d [...]
+
+The operating system's per-process file descriptor limit must be high enough 
to accommodate this, otherwise queries will fail with "too many open files" 
errors.
+
+**Linux:**
+
+The default limit is typically 1024, which is easily exceeded by spill-heavy 
workloads like the example above.
+
+To raise the limit permanently, add the following to 
`/etc/security/limits.conf`:
+
+```
+*    soft    nofile    65535
+*    hard    nofile    65535
+```
+
+Then log out and back in (or reboot) for the change to take effect. Verify 
with:
+
+```bash
+ulimit -n
+```
+
+**macOS:**
+
+```bash
+ulimit -n 65535
+```
+
+This affects the current shell session. Persistent/system-wide limits are OS 
and configuration dependent; consult your macOS configuration and documentation 
if you need to raise the hard limit.
diff --git a/mkdocs.yml b/mkdocs.yml
index 33c405a5..63a307ce 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -37,6 +37,8 @@ nav:
       - Home: index.md
       - Quickstart: quickstart-python.md
       - Programming Guide: programming-guide.md
+      - Configurations:
+        - Memory Management and Spilling: memory-management.md
       - API:
         - Python: reference/python.md
         - SQL: reference/sql/index.md

(sedona-db) 03/04: docs: add memory management and spill configuration guide (#679)

Reply via email to