asf-tooling commented on issue #1128:
URL:
https://github.com/apache/tooling-trusted-releases/issues/1128#issuecomment-4409781838
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `new_feature` • **Classification:** `actionable` •
**Confidence:** `high`
**Application domain(s):** `authentication_authorization`,
`shared_infrastructure`
### Summary
The `Cache` class in `atr/principal.py` stores committee/project membership
data per user but never removes entries for inactive users, causing unbounded
memory growth. The issue requests an eviction mechanism for entries not
refreshed within 2x TTL (1200 seconds). @alitheg confirmed this should be done
from `admins_refresh_loop` since background task workers can't interact with
in-memory storage in the main process.
### Where this lives in the code today
#### `atr/principal.py` — `Cache` (lines 198-214)
_needs modification_
This class needs an eviction method to remove entries not refreshed within
2x TTL.
```python
class Cache:
def __init__(self, cache_for_at_most_seconds: int = 600):
self.cache_for_at_most_seconds = cache_for_at_most_seconds
self.last_refreshed: dict[str, int | None] = {}
self.member_of: dict[str, frozenset[str]] = {}
self.participant_of: dict[str, frozenset[str]] = {}
def outdated(self, asf_uid: str) -> bool:
last_refreshed = self.last_refreshed.get(asf_uid)
if last_refreshed is None:
return True
now = int(time.time())
since_last_refresh = now - last_refreshed
return since_last_refresh > self.cache_for_at_most_seconds
cache = Cache()
```
#### `atr/cache.py` — `admins_refresh_loop` (lines 62-71)
_extension point_
@alitheg indicated this is where the principal cache eviction should be
called, since it runs in the main process event loop.
```python
async def admins_refresh_loop() -> None:
while True:
await asyncio.sleep(ADMINS_POLL_INTERVAL_SECONDS)
try:
users = await ldap.fetch_admin_users()
await admins_save_to_file(users)
_admins_update_app_extensions(users)
log.info(f"Admin users cache refreshed: {len(users)} users")
except Exception as e:
log.warning(f"Admin refresh failed: {e}")
```
#### `atr/principal.py` — `AuthoriserASFQuart.cache_refresh` (lines 236-255)
_currently does this_
This method adds entries to the cache but never removes them - the core of
the unbounded growth problem.
```python
async def cache_refresh(self, asf_uid: str, asfquart_session:
sql.UserSession) -> None:
if not self.__cache.outdated(asf_uid):
return
if not isinstance(asfquart_session, sql.UserSession):
# Defense in depth runtime check, already validated by the type
checker
raise AuthenticationError("Session is not a UserSession")
committees = frozenset(asfquart_session.committees)
projects = frozenset(asfquart_session.projects)
if "tooling" in projects:
# Tooling project members are actually Tooling committee members
# This is a special case, and reflects a similar special case in
LDAP
committees = committees.union({"tooling"})
committees, projects = _augment_test_membership(committees, projects)
# We do not check that the ASF UID is the same as the one in the
session
# It is the caller's responsibility to ensure this
self.__cache.member_of[asf_uid] = committees
self.__cache.participant_of[asf_uid] = projects
self.__cache.last_refreshed[asf_uid] = int(time.time())
```
### Where new code would go
- `atr/principal.py` — after symbol Cache.outdated
Add evict_stale() method to the Cache class that removes entries older
than 2x TTL
- `tests/unit/test_principal_cache.py` — new file
Unit tests verifying eviction logic removes stale entries and preserves
fresh ones
### Proposed approach
Add an `evict_stale()` method to the `Cache` class in `atr/principal.py`
that iterates over `last_refreshed` and removes entries where the elapsed time
exceeds `2 * cache_for_at_most_seconds`. This method removes the corresponding
keys from all three dictionaries (`last_refreshed`, `member_of`,
`participant_of`). Then, in `atr/cache.py`'s `admins_refresh_loop()`, import
and call this eviction method on each iteration. This follows @alitheg's
recommendation to use `admins_refresh_loop` since it runs in the main process
and can access the in-memory cache directly. The eviction runs every ~3631
seconds (the admin poll interval), which is well beyond the 1200-second
staleness threshold, ensuring inactive users are cleaned up within a reasonable
window.
### Suggested patches
#### `atr/principal.py`
Add evict_stale() method to Cache class that removes entries not refreshed
within 2x TTL
````diff
--- a/atr/principal.py
+++ b/atr/principal.py
@@ -170,6 +170,7 @@
class Cache:
def __init__(self, cache_for_at_most_seconds: int = 600):
self.cache_for_at_most_seconds = cache_for_at_most_seconds
+ self.eviction_threshold_seconds = 2 * cache_for_at_most_seconds
self.last_refreshed: dict[str, int | None] = {}
self.member_of: dict[str, frozenset[str]] = {}
self.participant_of: dict[str, frozenset[str]] = {}
@@ -181,6 +182,19 @@
since_last_refresh = now - last_refreshed
return since_last_refresh > self.cache_for_at_most_seconds
+ def evict_stale(self) -> int:
+ """Remove entries not refreshed within 2x TTL. Returns number of
evicted entries."""
+ now = int(time.time())
+ stale_uids: list[str] = []
+ for asf_uid, last_refreshed in self.last_refreshed.items():
+ if last_refreshed is None or (now - last_refreshed) >
self.eviction_threshold_seconds:
+ stale_uids.append(asf_uid)
+ for asf_uid in stale_uids:
+ del self.last_refreshed[asf_uid]
+ self.member_of.pop(asf_uid, None)
+ self.participant_of.pop(asf_uid, None)
+ return len(stale_uids)
+
cache = Cache()
````
#### `atr/cache.py`
Call principal cache eviction from admins_refresh_loop as recommended by
@alitheg
````diff
--- a/atr/cache.py
+++ b/atr/cache.py
@@ -30,6 +30,7 @@
import atr.config as config
import atr.ldap as ldap
import atr.log as log
+import atr.principal as principal
import atr.models.safe as safe
import atr.models.schema as schema
@@ -63,6 +64,11 @@
async def admins_refresh_loop() -> None:
while True:
await asyncio.sleep(ADMINS_POLL_INTERVAL_SECONDS)
+ try:
+ evicted = principal.cache.evict_stale()
+ if evicted > 0:
+ log.info(f"Principal authorization cache: evicted {evicted}
stale entries")
+ except Exception as e:
+ log.warning(f"Principal cache eviction failed: {e}")
try:
users = await ldap.fetch_admin_users()
await admins_save_to_file(users)
````
#### `tests/unit/test_principal_cache.py`
Unit tests verifying eviction logic removes stale entries and preserves
fresh ones
````diff
--- /dev/null
+++ b/tests/unit/test_principal_cache.py
@@ -0,0 +1,67 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import time
+
+from atr.principal import Cache
+
+
+def test_evict_stale_removes_old_entries():
+ cache = Cache(cache_for_at_most_seconds=600)
+ now = int(time.time())
+
+ # Add a stale entry (older than 2x TTL = 1200 seconds)
+ cache.last_refreshed["stale_user"] = now - 1500
+ cache.member_of["stale_user"] = frozenset({"proj-a"})
+ cache.participant_of["stale_user"] = frozenset({"proj-a"})
+
+ evicted = cache.evict_stale()
+
+ assert evicted == 1
+ assert "stale_user" not in cache.last_refreshed
+ assert "stale_user" not in cache.member_of
+ assert "stale_user" not in cache.participant_of
+
+
+def test_evict_stale_preserves_fresh_entries():
+ cache = Cache(cache_for_at_most_seconds=600)
+ now = int(time.time())
+
+ # Add a fresh entry (within 2x TTL)
+ cache.last_refreshed["fresh_user"] = now - 100
+ cache.member_of["fresh_user"] = frozenset({"proj-b"})
+ cache.participant_of["fresh_user"] = frozenset({"proj-b"})
+
+ evicted = cache.evict_stale()
+
+ assert evicted == 0
+ assert "fresh_user" in cache.last_refreshed
+ assert "fresh_user" in cache.member_of
+ assert "fresh_user" in cache.participant_of
+
+
+def test_evict_stale_mixed_entries():
+ cache = Cache(cache_for_at_most_seconds=600)
+ now = int(time.time())
+
+ cache.last_refreshed["stale"] = now - 1500
+ cache.member_of["stale"] = frozenset({"proj-a"})
+ cache.participant_of["stale"] = frozenset({"proj-a"})
+
+ cache.last_refreshed["fresh"] = now - 100
+ cache.member_of["fresh"] = frozenset({"proj-b"})
+ cache.participant_of["fresh"] = frozenset({"proj-b"})
+
+ evicted = cache.evict_stale()
+
+ assert evicted == 1
+ assert "stale" not in cache.last_refreshed
+ assert "fresh" in cache.last_refreshed
+
+
+def test_evict_stale_removes_none_last_refreshed():
+ cache = Cache(cache_for_at_most_seconds=600)
+ cache.last_refreshed["none_user"] = None
+ cache.member_of["none_user"] = frozenset()
+ cache.participant_of["none_user"] = frozenset()
+
+ evicted = cache.evict_stale()
+
+ assert evicted == 1
+ assert "none_user" not in cache.last_refreshed
````
### Open questions
- Should there be a circular import concern between atr/cache.py importing
atr/principal.py? Need to verify the existing import graph doesn't create a
cycle.
- Should the eviction threshold be configurable, or is 2x TTL (1200 seconds)
always appropriate?
- Should we log at DEBUG level instead of INFO for the eviction message to
avoid log noise when frequently evicting small numbers of entries?
### Files examined
- `atr/principal.py`
- `tests/unit/test_cache.py`
- `atr/cache.py`
- `atr/ldap.py`
- `atr/db/__init__.py`
- `atr/models/sql.py`
- `tests/unit/test_ldap.py`
- `atr/sessions.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]