comphead commented on code in PR #3809:
URL: https://github.com/apache/datafusion-comet/pull/3809#discussion_r3002834089


##########
docs/source/contributor-guide/development.md:
##########
@@ -101,6 +101,74 @@ The runtime is created once per executor JVM in a 
`Lazy<Runtime>` static:
 | Storing `JNIEnv` in an operator           | **No** | `JNIEnv` is 
thread-specific              |
 | Capturing state at plan creation time     | Yes    | Runs on executor 
thread, store in struct |
 
+## Global singletons
+
+Comet code runs in both the driver and executor JVM processes, and different 
parts of the
+codebase run in each. Global singletons have **process lifetime** — they are 
created once and
+never dropped until the JVM exits. Since multiple Spark jobs, queries, and 
tasks share the same
+process, this makes it difficult to reason about what state a singleton holds 
and whether it is
+still valid.
+
+### How to recognize them
+
+**Rust:** `static` variables using `OnceLock`, `LazyLock`, `OnceCell`, `Lazy`, 
or `lazy_static!`:
+
+```rust
+static TOKIO_RUNTIME: OnceLock<Runtime> = OnceLock::new();
+static TASK_SHARED_MEMORY_POOLS: Lazy<Mutex<HashMap<i64, PerTaskMemoryPool>>> 
= Lazy::new(..);
+```
+
+**Java:** `static` fields, especially mutable collections:
+
+```java
+private static final HashMap<Long, HashMap<Long, ScalarSubquery>> subqueryMap 
= new HashMap<>();
+```
+
+**Scala:** `object` declarations (companion objects are JVM singletons) 
holding mutable state:
+
+```scala
+object MyCache {
+  private val cache = new ConcurrentHashMap[String, Value]()
+}
+```
+
+### Why they are dangerous
+

Review Comment:
   Giving more specifics, I would also add one example which luckily was caught 
by Spark tests. 
   
   I tried a global singleton on the executor level to cache parquet metadata. 
And if other query removes the file, the cache still have it so the system 
still thinks the file is available and it is not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to