bitflicker64 opened a new issue, #3064:
URL: https://github.com/apache/hugegraph/issues/3064

   ## Source
   - **File:** 
`hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/RamCache.java`
   - **References:** L179-L180, L404-L405
   
   ---
   
   Now I have the full file. Let me analyze the issue carefully.
   
   The TODO comment is at line 180:
   ```java
   @Watched(prefix = "ramcache")
   @Override
   public void clear() {
       // TODO: synchronized
       if (this.capacity() <= 0 || this.map.isEmpty()) {
           return;
       }
       this.map.clear();
       this.queue.clear();
   }
   ```
   
   The issue is that `clear()` lacks synchronization while all other mutating 
methods (`write`, `remove`, `access`) use `keyLock` to protect concurrent 
access to both `this.map` and `this.queue`.
   
   Let me think through the exact problem:
   
   1. `write()` acquires a per-key lock via `this.keyLock.lock(id)` before 
modifying `this.map` and `this.queue`.
   2. `remove()` acquires a per-key lock via `this.keyLock.lock(id)` before 
modifying `this.map` and `this.queue`.
   3. `access()` acquires a per-key lock via `this.keyLock.lock(id)` before 
modifying `this.queue`.
   4. `clear()` does NOT acquire any lock — it directly calls 
`this.map.clear()` and `this.queue.clear()`.
   
   The race conditions:
   - `this.map.clear()` and `this.queue.clear()` are called sequentially 
without any lock. Between these two calls, another thread could call `write()` 
which adds a new entry to both `this.map` and `this.queue`. After `clear()` 
finishes, the queue would be empty but the map would have the new entry (or 
vice versa), leaving the two data structures in an inconsistent state.
   - Also, `this.map.isEmpty()` check and `this.map.clear()` are not atomic — 
another thread could add entries between the check and the clear.
   - Even within `this.queue.clear()`, the queue's own internal locking handles 
its own consistency, but the map and queue together are not kept consistent.
   
   The key issue: `this.map` and `this.queue` must always be in sync (every 
entry in the map has a corresponding node in the queue, and vice versa). 
`clear()` clears them sequentially without holding any lock, so concurrent 
`write()` or `remove()` calls can interleave and leave them inconsistent.
   
   For example:
   1. Thread A calls `clear()`, calls `this.map.clear()` 
   2. Thread B calls `write(id, value)`, acquires keyLock for `id`, adds to map 
and queue
   3. Thread A calls `this.queue.clear()` — now the map has the entry from 
Thread B but the queue is empty → inconsistent state
   
   The fix would need to acquire all key locks (or a global lock) before 
clearing both structures atomically. Since `KeyLock` is a striped lock, 
acquiring all stripes would be needed.
   
   Let me also check the `AbstractCache` class to understand `capacity()` and 
the `KeyLock` class.
   
   Actually, I have enough information to answer the question. Let me also 
check how many files would be affected — the question is about how many files 
would need to change. Since this is a single method in a single file, only 
`RamCache.java` would need to change. But let me check if there are other cache 
implementations that might have the same issue.
   
   Now I have all the information needed to give a thorough answer. Let me 
analyze the issue.
   
   The `clear()` method at line 179-186 of `RamCache.java`:
   
   ```java
   @Watched(prefix = "ramcache")
   @Override
   public void clear() {
       // TODO: synchronized
       if (this.capacity() <= 0 || this.map.isEmpty()) {
           return;
       }
       this.map.clear();
       this.queue.clear();
   }
   ```
   
   **The exact issue:**
   
   `RamCache` maintains two data structures that must always be in sync:
   1. `this.map` — a `ConcurrentHashMap<Id, LinkNode<Id, Object>>`
   2. `this.queue` — a `LinkedQueueNonBigLock<Id, Object>` (a custom 
doubly-linked LRU queue)
   
   Every other mutating method (`write`, `remove`, `access`) acquires a per-key 
lock from `this.keyLock` (a `KeyLock` backed by a `Striped<Lock>`) before 
touching either structure. This ensures that concurrent modifications to the 
same key are serialized, and that `map` and `queue` stay consistent with each 
other.
   
   `clear()` does NOT acquire any lock. It calls `this.map.clear()` and then 
`this.queue.clear()` sequentially. This creates several race conditions:
   
   **Race condition 1 (map/queue inconsistency):**
   - Thread A calls `clear()`, executes `this.map.clear()` — map is now empty
   - Thread B calls `write(id, value)`, acquires keyLock for `id`, adds entry 
to both map and queue
   - Thread A calls `this.queue.clear()` — queue is now empty, but map still 
has Thread B's entry
   - Result: map has an entry with no corresponding queue node → broken LRU 
invariant
   
   **Race condition 2 (reverse):**
   - Thread A calls `clear()`, executes `this.map.clear()`
   - Thread B calls `write(id, value)`, adds to map and queue
   - Thread A calls `this.queue.clear()` — clears Thread B's node from queue
   - Now map has the entry but queue doesn't → same broken invariant
   
   **Race condition 3 (early return check is not atomic):**
   - `this.map.isEmpty()` returns false
   - Another thread clears all entries
   - `this.map.clear()` runs on an already-empty map (harmless but shows the 
check isn't atomic)
   
   **What needs to change:**
   
   The fix requires acquiring ALL stripes of `this.keyLock` before clearing 
both structures. Since `KeyLock` uses `Striped<Lock>`, you'd need to lock all 
stripes (iterate `locks.size()` and lock each one), then clear both `map` and 
`queue`, then unlock all. This is the only way to ensure no concurrent 
`write()`/`remove()`/`access()` can interleave.
   
   Alternatively, a dedicated `ReentrantReadWriteLock` could be used where 
`write()`, `remove()`, `access()` acquire the read lock (allowing concurrency 
among themselves, since they already use per-key locks), and `clear()` acquires 
the write lock (exclusive).
   
   **Files affected:**
   
   Only **1 file** needs to change: `RamCache.java`. The fix is entirely 
self-contained within that file's `clear()` method. The `KeyLock` class would 
not need modification since it already exposes `lockAll()`. No callers of 
`clear()` need to change.
   
   
   ## The Issue
   
   `RamCache` maintains two data structures that **must always be in sync**:
   
   - `this.map` — a `ConcurrentHashMap<Id, LinkNode<Id, Object>>`
   - `this.queue` — a custom doubly-linked LRU queue (`LinkedQueueNonBigLock`)
   
   Every other mutating method acquires a per-key lock from `this.keyLock` (a 
`Striped<Lock>`) before touching either structure:
   
   
   
   
   `clear()` acquires **no lock at all**:
   
   
   
   ```java
   public void clear() {
       // TODO: synchronized
       if (this.capacity() <= 0 || this.map.isEmpty()) {
           return;
       }
       this.map.clear();   // <-- no lock held
       this.queue.clear(); // <-- no lock held
   }
   ```
   
   ## Concrete Race Condition
   
   The two `clear()` calls are not atomic. Between `this.map.clear()` and 
`this.queue.clear()`, a concurrent `write()` can interleave:
   
   ```
   Thread A (clear):  map.clear()  ←── map is now empty
   Thread B (write):  keyLock.lock(id) → map.put(id, node) → queue.enqueue(node)
   Thread A (clear):  queue.clear() ←── wipes Thread B's node from the queue
   ```
   
   Result: `map` contains `id → node`, but `queue` has no corresponding node. 
The LRU invariant is broken. Subsequent `access()` calls will try to 
`queue.remove(node)` on a node that is no longer in the queue, and 
`removeOldestIfCacheFull()` will `dequeue()` nothing while the map still holds 
stale entries.
   
   The reverse is also possible (queue has a node, map doesn't), and the 
`map.isEmpty()` early-return check is also non-atomic with respect to 
concurrent writes.
   
   ## What Needs to Change
   
   The fix must hold **all stripes** of `this.keyLock` while clearing both 
structures, so no concurrent `write()`/`remove()`/`access()` can interleave. 
`KeyLock` is backed by a `Striped<Lock>` with `availableProcessors() * 4` 
stripes:
   
   
   
   The corrected `clear()` would need to lock all stripes (e.g., by iterating 
`locks.getAt(i)` for all `i`), then atomically clear both `map` and `queue`, 
then unlock. Alternatively, a `ReentrantReadWriteLock` could be introduced 
where `write`/`remove`/`access` hold the **read lock** (they already serialize 
per-key via `keyLock`) and `clear()` holds the **write lock** exclusively.
   
   ## Files Affected
   
   **Only 1 file** needs to change: `RamCache.java`. The fix is entirely 
self-contained in the `clear()` method. No callers of `clear()` and no other 
cache implementations need modification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to