bitflicker64 opened a new issue, #3064:
URL: https://github.com/apache/hugegraph/issues/3064
## Source
- **File:**
`hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/RamCache.java`
- **References:** L179-L180, L404-L405
---
Now I have the full file. Let me analyze the issue carefully.
The TODO comment is at line 180:
```java
@Watched(prefix = "ramcache")
@Override
public void clear() {
// TODO: synchronized
if (this.capacity() <= 0 || this.map.isEmpty()) {
return;
}
this.map.clear();
this.queue.clear();
}
```
The issue is that `clear()` lacks synchronization while all other mutating
methods (`write`, `remove`, `access`) use `keyLock` to protect concurrent
access to both `this.map` and `this.queue`.
Let me think through the exact problem:
1. `write()` acquires a per-key lock via `this.keyLock.lock(id)` before
modifying `this.map` and `this.queue`.
2. `remove()` acquires a per-key lock via `this.keyLock.lock(id)` before
modifying `this.map` and `this.queue`.
3. `access()` acquires a per-key lock via `this.keyLock.lock(id)` before
modifying `this.queue`.
4. `clear()` does NOT acquire any lock — it directly calls
`this.map.clear()` and `this.queue.clear()`.
The race conditions:
- `this.map.clear()` and `this.queue.clear()` are called sequentially
without any lock. Between these two calls, another thread could call `write()`
which adds a new entry to both `this.map` and `this.queue`. After `clear()`
finishes, the queue would be empty but the map would have the new entry (or
vice versa), leaving the two data structures in an inconsistent state.
- Also, `this.map.isEmpty()` check and `this.map.clear()` are not atomic —
another thread could add entries between the check and the clear.
- Even within `this.queue.clear()`, the queue's own internal locking handles
its own consistency, but the map and queue together are not kept consistent.
The key issue: `this.map` and `this.queue` must always be in sync (every
entry in the map has a corresponding node in the queue, and vice versa).
`clear()` clears them sequentially without holding any lock, so concurrent
`write()` or `remove()` calls can interleave and leave them inconsistent.
For example:
1. Thread A calls `clear()`, calls `this.map.clear()`
2. Thread B calls `write(id, value)`, acquires keyLock for `id`, adds to map
and queue
3. Thread A calls `this.queue.clear()` — now the map has the entry from
Thread B but the queue is empty → inconsistent state
The fix would need to acquire all key locks (or a global lock) before
clearing both structures atomically. Since `KeyLock` is a striped lock,
acquiring all stripes would be needed.
Let me also check the `AbstractCache` class to understand `capacity()` and
the `KeyLock` class.
Actually, I have enough information to answer the question. Let me also
check how many files would be affected — the question is about how many files
would need to change. Since this is a single method in a single file, only
`RamCache.java` would need to change. But let me check if there are other cache
implementations that might have the same issue.
Now I have all the information needed to give a thorough answer. Let me
analyze the issue.
The `clear()` method at line 179-186 of `RamCache.java`:
```java
@Watched(prefix = "ramcache")
@Override
public void clear() {
// TODO: synchronized
if (this.capacity() <= 0 || this.map.isEmpty()) {
return;
}
this.map.clear();
this.queue.clear();
}
```
**The exact issue:**
`RamCache` maintains two data structures that must always be in sync:
1. `this.map` — a `ConcurrentHashMap<Id, LinkNode<Id, Object>>`
2. `this.queue` — a `LinkedQueueNonBigLock<Id, Object>` (a custom
doubly-linked LRU queue)
Every other mutating method (`write`, `remove`, `access`) acquires a per-key
lock from `this.keyLock` (a `KeyLock` backed by a `Striped<Lock>`) before
touching either structure. This ensures that concurrent modifications to the
same key are serialized, and that `map` and `queue` stay consistent with each
other.
`clear()` does NOT acquire any lock. It calls `this.map.clear()` and then
`this.queue.clear()` sequentially. This creates several race conditions:
**Race condition 1 (map/queue inconsistency):**
- Thread A calls `clear()`, executes `this.map.clear()` — map is now empty
- Thread B calls `write(id, value)`, acquires keyLock for `id`, adds entry
to both map and queue
- Thread A calls `this.queue.clear()` — queue is now empty, but map still
has Thread B's entry
- Result: map has an entry with no corresponding queue node → broken LRU
invariant
**Race condition 2 (reverse):**
- Thread A calls `clear()`, executes `this.map.clear()`
- Thread B calls `write(id, value)`, adds to map and queue
- Thread A calls `this.queue.clear()` — clears Thread B's node from queue
- Now map has the entry but queue doesn't → same broken invariant
**Race condition 3 (early return check is not atomic):**
- `this.map.isEmpty()` returns false
- Another thread clears all entries
- `this.map.clear()` runs on an already-empty map (harmless but shows the
check isn't atomic)
**What needs to change:**
The fix requires acquiring ALL stripes of `this.keyLock` before clearing
both structures. Since `KeyLock` uses `Striped<Lock>`, you'd need to lock all
stripes (iterate `locks.size()` and lock each one), then clear both `map` and
`queue`, then unlock all. This is the only way to ensure no concurrent
`write()`/`remove()`/`access()` can interleave.
Alternatively, a dedicated `ReentrantReadWriteLock` could be used where
`write()`, `remove()`, `access()` acquire the read lock (allowing concurrency
among themselves, since they already use per-key locks), and `clear()` acquires
the write lock (exclusive).
**Files affected:**
Only **1 file** needs to change: `RamCache.java`. The fix is entirely
self-contained within that file's `clear()` method. The `KeyLock` class would
not need modification since it already exposes `lockAll()`. No callers of
`clear()` need to change.
## The Issue
`RamCache` maintains two data structures that **must always be in sync**:
- `this.map` — a `ConcurrentHashMap<Id, LinkNode<Id, Object>>`
- `this.queue` — a custom doubly-linked LRU queue (`LinkedQueueNonBigLock`)
Every other mutating method acquires a per-key lock from `this.keyLock` (a
`Striped<Lock>`) before touching either structure:
`clear()` acquires **no lock at all**:
```java
public void clear() {
// TODO: synchronized
if (this.capacity() <= 0 || this.map.isEmpty()) {
return;
}
this.map.clear(); // <-- no lock held
this.queue.clear(); // <-- no lock held
}
```
## Concrete Race Condition
The two `clear()` calls are not atomic. Between `this.map.clear()` and
`this.queue.clear()`, a concurrent `write()` can interleave:
```
Thread A (clear): map.clear() ←── map is now empty
Thread B (write): keyLock.lock(id) → map.put(id, node) → queue.enqueue(node)
Thread A (clear): queue.clear() ←── wipes Thread B's node from the queue
```
Result: `map` contains `id → node`, but `queue` has no corresponding node.
The LRU invariant is broken. Subsequent `access()` calls will try to
`queue.remove(node)` on a node that is no longer in the queue, and
`removeOldestIfCacheFull()` will `dequeue()` nothing while the map still holds
stale entries.
The reverse is also possible (queue has a node, map doesn't), and the
`map.isEmpty()` early-return check is also non-atomic with respect to
concurrent writes.
## What Needs to Change
The fix must hold **all stripes** of `this.keyLock` while clearing both
structures, so no concurrent `write()`/`remove()`/`access()` can interleave.
`KeyLock` is backed by a `Striped<Lock>` with `availableProcessors() * 4`
stripes:
The corrected `clear()` would need to lock all stripes (e.g., by iterating
`locks.getAt(i)` for all `i`), then atomically clear both `map` and `queue`,
then unlock. Alternatively, a `ReentrantReadWriteLock` could be introduced
where `write`/`remove`/`access` hold the **read lock** (they already serialize
per-key via `keyLock`) and `clear()` holds the **write lock** exclusively.
## Files Affected
**Only 1 file** needs to change: `RamCache.java`. The fix is entirely
self-contained in the `clear()` method. No callers of `clear()` and no other
cache implementations need modification.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]