Re: [PR] docs: add memory management and spill configuration guide [sedona-db]

via GitHub Tue, 03 Mar 2026 08:17:48 -0800


Copilot commented on code in PR #679:
URL: https://github.com/apache/sedona-db/pull/679#discussion_r2879214714



##########
docs/memory-management.md:
##########
@@ -0,0 +1,241 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Memory Management and Spilling
+
+SedonaDB supports memory-limited execution with automatic spill-to-disk,
+allowing you to process datasets that are larger than available memory. When a
+memory limit is configured, operators that exceed their memory budget
+automatically spill intermediate data to temporary files on disk and read them
+back as needed.
+
+## Configuring Memory Limits
+
+Set `memory_limit` on the context options to cap the total memory available for
+query execution. The limit accepts an integer (bytes) or a human-readable 
string
+such as `"4gb"`, `"512m"`, or `"1.5g"`.
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+sd.options.memory_limit = "4gb"
+```
+
+Without a memory limit, SedonaDB uses an unbounded memory pool and operators
+can use as much memory as needed (until the process hits system limits). In
+this mode, operators typically won't spill to disk because there is no memory
+budget to enforce.
+
+!!! note
+    All runtime options (`memory_limit`, `memory_pool_type`, `temp_dir`,
+    `unspillable_reserve_ratio`) must be set **before** the first query is
+    executed. Once the first query runs, the internal execution context is
+    created and these options become read-only.

Review Comment:
   The wording here suggests runtime options can be set any time before the 
*first query is executed*, but in the Python API the internal context is 
initialized (and options are frozen) on the first call that touches `sd._impl` 
(e.g., `sd.sql(...)`, `sd.read_parquet(...)`), even if you never call 
`.execute()`/`.show()`. Consider rephrasing to “before the internal context is 
initialized / before the first call to `sd.sql` or any read method” to avoid 
users constructing a DataFrame and then finding options are already read-only.
   ```suggestion
       `unspillable_reserve_ratio`) must be set **before** the internal context
       is initialized — that is, before the first call to `sd.sql(...)` or any
       read method (for example, `sd.read_parquet(...)`). Once the internal
       context is created, these options become read-only.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: add memory management and spill configuration guide [sedona-db]

Reply via email to