This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new feb629450 Publish built docs triggered by 
6aa577b34ae066452cebc14382c8aa6e8bb332b7
feb629450 is described below

commit feb6294509e12fd4e1a0dc06276e70d25afab9e2
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Fri Mar 27 20:12:33 2026 +0000

    Publish built docs triggered by 6aa577b34ae066452cebc14382c8aa6e8bb332b7
---
 _sources/contributor-guide/development.md.txt | 68 +++++++++++++++++++++++
 contributor-guide/development.html            | 79 +++++++++++++++++++++++++++
 contributor-guide/index.html                  |  1 +
 searchindex.js                                |  2 +-
 4 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/_sources/contributor-guide/development.md.txt 
b/_sources/contributor-guide/development.md.txt
index b83f3174d..47f034c97 100644
--- a/_sources/contributor-guide/development.md.txt
+++ b/_sources/contributor-guide/development.md.txt
@@ -101,6 +101,74 @@ The runtime is created once per executor JVM in a 
`Lazy<Runtime>` static:
 | Storing `JNIEnv` in an operator           | **No** | `JNIEnv` is 
thread-specific              |
 | Capturing state at plan creation time     | Yes    | Runs on executor 
thread, store in struct |
 
+## Global singletons
+
+Comet code runs in both the driver and executor JVM processes, and different 
parts of the
+codebase run in each. Global singletons have **process lifetime** — they are 
created once and
+never dropped until the JVM exits. Since multiple Spark jobs, queries, and 
tasks share the same
+process, this makes it difficult to reason about what state a singleton holds 
and whether it is
+still valid.
+
+### How to recognize them
+
+**Rust:** `static` variables using `OnceLock`, `LazyLock`, `OnceCell`, `Lazy`, 
or `lazy_static!`:
+
+```rust
+static TOKIO_RUNTIME: OnceLock<Runtime> = OnceLock::new();
+static TASK_SHARED_MEMORY_POOLS: Lazy<Mutex<HashMap<i64, PerTaskMemoryPool>>> 
= Lazy::new(..);
+```
+
+**Java:** `static` fields, especially mutable collections:
+
+```java
+private static final HashMap<Long, HashMap<Long, ScalarSubquery>> subqueryMap 
= new HashMap<>();
+```
+
+**Scala:** `object` declarations (companion objects are JVM singletons) 
holding mutable state:
+
+```scala
+object MyCache {
+  private val cache = new ConcurrentHashMap[String, Value]()
+}
+```
+
+### Why they are dangerous
+
+- **Credential staleness.** A singleton caching an authenticated client will 
hold stale
+  credentials after token rotation, causing silent failures mid-job.
+- **Unbounded growth.** A cache keyed by file path or configuration grows with 
every query
+  but never shrinks. Over hours of process uptime this becomes a memory leak.
+- **Cross-job contamination.** Different Spark jobs on the same process may 
use different
+  configurations. A singleton initialized by the first job silently serves 
wrong state to
+  subsequent jobs.
+- **Testing difficulty.** Global state persists across test cases, making tests
+  order-dependent.
+
+### When a singleton is acceptable
+
+Some state genuinely has process lifetime:
+
+| Singleton                                     | Why it is safe               
                       |
+| --------------------------------------------- | 
--------------------------------------------------- |
+| `TOKIO_RUNTIME`                               | One runtime per executor, no 
configuration variance |
+| `JAVA_VM` / `JVM_CLASSES`                     | One JVM per process, set 
once at JNI load           |
+| `OperatorRegistry` / `ExpressionRegistry`     | Immutable after 
initialization                      |
+| Compiled `Regex` patterns (`LazyLock<Regex>`) | Stateless and immutable      
                       |
+
+### When to avoid a singleton
+
+If any of these apply, do **not** use a global singleton:
+
+- The state depends on configuration that can vary between jobs or queries
+- The state holds credentials or authenticated connections that will not 
expire or invalidate appropriately
+- The state grows proportionally to the number of queries or files processed
+- The state needs cleanup or refresh during process lifetime
+
+Instead, scope state to the plan or task by adding the cache as a field in an 
existing session or context object.
+
+If a singleton is truly needed, add a comment explaining why `static` is the 
right lifetime,
+whether the cache is bounded, and how credential refresh is handled (if 
applicable).
+
 ## Development Setup
 
 1. Make sure `JAVA_HOME` is set and point to JDK using [support 
matrix](../user-guide/latest/installation.md)
diff --git a/contributor-guide/development.html 
b/contributor-guide/development.html
index 6ea16e525..a95ac6a4a 100644
--- a/contributor-guide/development.html
+++ b/contributor-guide/development.html
@@ -560,6 +560,85 @@ to unwrap decryption keys during Parquet reads. It uses a 
stored <code class="do
 </div>
 </section>
 </section>
+<section id="global-singletons">
+<h2>Global singletons<a class="headerlink" href="#global-singletons" 
title="Link to this heading">#</a></h2>
+<p>Comet code runs in both the driver and executor JVM processes, and 
different parts of the
+codebase run in each. Global singletons have <strong>process lifetime</strong> 
— they are created once and
+never dropped until the JVM exits. Since multiple Spark jobs, queries, and 
tasks share the same
+process, this makes it difficult to reason about what state a singleton holds 
and whether it is
+still valid.</p>
+<section id="how-to-recognize-them">
+<h3>How to recognize them<a class="headerlink" href="#how-to-recognize-them" 
title="Link to this heading">#</a></h3>
+<p><strong>Rust:</strong> <code class="docutils literal notranslate"><span 
class="pre">static</span></code> variables using <code class="docutils literal 
notranslate"><span class="pre">OnceLock</span></code>, <code class="docutils 
literal notranslate"><span class="pre">LazyLock</span></code>, <code 
class="docutils literal notranslate"><span class="pre">OnceCell</span></code>, 
<code class="docutils literal notranslate"><span 
class="pre">Lazy</span></code>, or <code class="docutils literal [...]
+<div class="highlight-rust notranslate"><div 
class="highlight"><pre><span></span><span class="k">static</span><span 
class="w"> </span><span class="n">TOKIO_RUNTIME</span><span 
class="p">:</span><span class="w"> </span><span class="nc">OnceLock</span><span 
class="o">&lt;</span><span class="n">Runtime</span><span 
class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><span class="n">OnceLock</span><span class="p">::</span><span 
class="n">new</span><spa [...]
+<span class="k">static</span><span class="w"> </span><span 
class="n">TASK_SHARED_MEMORY_POOLS</span><span class="p">:</span><span 
class="w"> </span><span class="nc">Lazy</span><span class="o">&lt;</span><span 
class="n">Mutex</span><span class="o">&lt;</span><span 
class="n">HashMap</span><span class="o">&lt;</span><span 
class="kt">i64</span><span class="p">,</span><span class="w"> </span><span 
class="n">PerTaskMemoryPool</span><span class="o">&gt;&gt;&gt;</span><span 
class="w"> </span><sp [...]
+</pre></div>
+</div>
+<p><strong>Java:</strong> <code class="docutils literal notranslate"><span 
class="pre">static</span></code> fields, especially mutable collections:</p>
+<div class="highlight-java notranslate"><div 
class="highlight"><pre><span></span><span class="kd">private</span><span 
class="w"> </span><span class="kd">static</span><span class="w"> </span><span 
class="kd">final</span><span class="w"> </span><span 
class="n">HashMap</span><span class="o">&lt;</span><span 
class="n">Long</span><span class="p">,</span><span class="w"> </span><span 
class="n">HashMap</span><span class="o">&lt;</span><span 
class="n">Long</span><span class="p">,</span><span cla [...]
+</pre></div>
+</div>
+<p><strong>Scala:</strong> <code class="docutils literal notranslate"><span 
class="pre">object</span></code> declarations (companion objects are JVM 
singletons) holding mutable state:</p>
+<div class="highlight-scala notranslate"><div 
class="highlight"><pre><span></span><span class="k">object</span><span 
class="w"> </span><span class="nc">MyCache</span><span class="w"> </span><span 
class="p">{</span>
+<span class="w">  </span><span class="k">private</span><span class="w"> 
</span><span class="kd">val</span><span class="w"> </span><span 
class="n">cache</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><span class="k">new</span><span class="w"> </span><span 
class="nc">ConcurrentHashMap</span><span class="p">[</span><span 
class="nc">String</span><span class="p">,</span><span class="w"> </span><span 
class="nc">Value</span><span class="p">]()</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</section>
+<section id="why-they-are-dangerous">
+<h3>Why they are dangerous<a class="headerlink" href="#why-they-are-dangerous" 
title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p><strong>Credential staleness.</strong> A singleton caching an 
authenticated client will hold stale
+credentials after token rotation, causing silent failures mid-job.</p></li>
+<li><p><strong>Unbounded growth.</strong> A cache keyed by file path or 
configuration grows with every query
+but never shrinks. Over hours of process uptime this becomes a memory 
leak.</p></li>
+<li><p><strong>Cross-job contamination.</strong> Different Spark jobs on the 
same process may use different
+configurations. A singleton initialized by the first job silently serves wrong 
state to
+subsequent jobs.</p></li>
+<li><p><strong>Testing difficulty.</strong> Global state persists across test 
cases, making tests
+order-dependent.</p></li>
+</ul>
+</section>
+<section id="when-a-singleton-is-acceptable">
+<h3>When a singleton is acceptable<a class="headerlink" 
href="#when-a-singleton-is-acceptable" title="Link to this heading">#</a></h3>
+<p>Some state genuinely has process lifetime:</p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Singleton</p></th>
+<th class="head"><p>Why it is safe</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">TOKIO_RUNTIME</span></code></p></td>
+<td><p>One runtime per executor, no configuration variance</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">JAVA_VM</span></code> / <code class="docutils literal 
notranslate"><span class="pre">JVM_CLASSES</span></code></p></td>
+<td><p>One JVM per process, set once at JNI load</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">OperatorRegistry</span></code> / <code class="docutils literal 
notranslate"><span class="pre">ExpressionRegistry</span></code></p></td>
+<td><p>Immutable after initialization</p></td>
+</tr>
+<tr class="row-odd"><td><p>Compiled <code class="docutils literal 
notranslate"><span class="pre">Regex</span></code> patterns (<code 
class="docutils literal notranslate"><span 
class="pre">LazyLock&lt;Regex&gt;</span></code>)</p></td>
+<td><p>Stateless and immutable</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+<section id="when-to-avoid-a-singleton">
+<h3>When to avoid a singleton<a class="headerlink" 
href="#when-to-avoid-a-singleton" title="Link to this heading">#</a></h3>
+<p>If any of these apply, do <strong>not</strong> use a global singleton:</p>
+<ul class="simple">
+<li><p>The state depends on configuration that can vary between jobs or 
queries</p></li>
+<li><p>The state holds credentials or authenticated connections that will not 
expire or invalidate appropriately</p></li>
+<li><p>The state grows proportionally to the number of queries or files 
processed</p></li>
+<li><p>The state needs cleanup or refresh during process lifetime</p></li>
+</ul>
+<p>Instead, scope state to the plan or task by adding the cache as a field in 
an existing session or context object.</p>
+<p>If a singleton is truly needed, add a comment explaining why <code 
class="docutils literal notranslate"><span class="pre">static</span></code> is 
the right lifetime,
+whether the cache is bounded, and how credential refresh is handled (if 
applicable).</p>
+</section>
+</section>
 <section id="development-setup">
 <h2>Development Setup<a class="headerlink" href="#development-setup" 
title="Link to this heading">#</a></h2>
 <ol class="arabic simple">
diff --git a/contributor-guide/index.html b/contributor-guide/index.html
index 61d17dca2..c623b842f 100644
--- a/contributor-guide/index.html
+++ b/contributor-guide/index.html
@@ -516,6 +516,7 @@ under the License.
 <li class="toctree-l1"><a class="reference internal" 
href="development.html">Development Guide</a><ul>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#project-layout">Project Layout</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#threading-architecture">Threading Architecture</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="development.html#global-singletons">Global singletons</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#development-setup">Development Setup</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#build-test">Build &amp; Test</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#common-build-and-test-pitfalls">Common Build and Test 
Pitfalls</a></li>
diff --git a/searchindex.js b/searchindex.js
index 11ed9c5b1..db9597404 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Format Your Code": [[12, 
"format-your-code"]], "1. Install Comet": [[14, "install-comet"], [23, 
"install-comet"]], "1. Native Operators (nativeExecs map)": [[4, 
"native-operators-nativeexecs-map"]], "2. Build and Verify": [[12, 
"build-and-verify"]], "2. Clone Iceberg and Apply Diff": [[14, 
"clone-iceberg-and-apply-diff"]], "2. Clone Spark and Apply Diff": [[23, 
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-m [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Format Your Code": [[12, 
"format-your-code"]], "1. Install Comet": [[14, "install-comet"], [23, 
"install-comet"]], "1. Native Operators (nativeExecs map)": [[4, 
"native-operators-nativeexecs-map"]], "2. Build and Verify": [[12, 
"build-and-verify"]], "2. Clone Iceberg and Apply Diff": [[14, 
"clone-iceberg-and-apply-diff"]], "2. Clone Spark and Apply Diff": [[23, 
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-m [...]
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to