asf-tooling commented on issue #1128:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/1128#issuecomment-4409781838

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `new_feature`  •  **Classification:** `actionable`  •  
**Confidence:** `high`
   **Application domain(s):** `authentication_authorization`, 
`shared_infrastructure`
   
   ### Summary
   The `Cache` class in `atr/principal.py` stores committee/project membership 
data per user but never removes entries for inactive users, causing unbounded 
memory growth. The issue requests an eviction mechanism for entries not 
refreshed within 2x TTL (1200 seconds). @alitheg confirmed this should be done 
from `admins_refresh_loop` since background task workers can't interact with 
in-memory storage in the main process.
   
   ### Where this lives in the code today
   
   #### `atr/principal.py` — `Cache` (lines 198-214)
   _needs modification_
   This class needs an eviction method to remove entries not refreshed within 
2x TTL.
   
   ```python
   class Cache:
       def __init__(self, cache_for_at_most_seconds: int = 600):
           self.cache_for_at_most_seconds = cache_for_at_most_seconds
           self.last_refreshed: dict[str, int | None] = {}
           self.member_of: dict[str, frozenset[str]] = {}
           self.participant_of: dict[str, frozenset[str]] = {}
   
       def outdated(self, asf_uid: str) -> bool:
           last_refreshed = self.last_refreshed.get(asf_uid)
           if last_refreshed is None:
               return True
           now = int(time.time())
           since_last_refresh = now - last_refreshed
           return since_last_refresh > self.cache_for_at_most_seconds
   
   
   cache = Cache()
   ```
   
   #### `atr/cache.py` — `admins_refresh_loop` (lines 62-71)
   _extension point_
   @alitheg indicated this is where the principal cache eviction should be 
called, since it runs in the main process event loop.
   
   ```python
   async def admins_refresh_loop() -> None:
       while True:
           await asyncio.sleep(ADMINS_POLL_INTERVAL_SECONDS)
           try:
               users = await ldap.fetch_admin_users()
               await admins_save_to_file(users)
               _admins_update_app_extensions(users)
               log.info(f"Admin users cache refreshed: {len(users)} users")
           except Exception as e:
               log.warning(f"Admin refresh failed: {e}")
   ```
   
   #### `atr/principal.py` — `AuthoriserASFQuart.cache_refresh` (lines 236-255)
   _currently does this_
   This method adds entries to the cache but never removes them - the core of 
the unbounded growth problem.
   
   ```python
       async def cache_refresh(self, asf_uid: str, asfquart_session: 
sql.UserSession) -> None:
           if not self.__cache.outdated(asf_uid):
               return
           if not isinstance(asfquart_session, sql.UserSession):
               # Defense in depth runtime check, already validated by the type 
checker
               raise AuthenticationError("Session is not a UserSession")
   
           committees = frozenset(asfquart_session.committees)
           projects = frozenset(asfquart_session.projects)
           if "tooling" in projects:
               # Tooling project members are actually Tooling committee members
               # This is a special case, and reflects a similar special case in 
LDAP
               committees = committees.union({"tooling"})
           committees, projects = _augment_test_membership(committees, projects)
   
           # We do not check that the ASF UID is the same as the one in the 
session
           # It is the caller's responsibility to ensure this
           self.__cache.member_of[asf_uid] = committees
           self.__cache.participant_of[asf_uid] = projects
           self.__cache.last_refreshed[asf_uid] = int(time.time())
   ```
   
   ### Where new code would go
   - `atr/principal.py` — after symbol Cache.outdated
     Add evict_stale() method to the Cache class that removes entries older 
than 2x TTL
   - `tests/unit/test_principal_cache.py` — new file
     Unit tests verifying eviction logic removes stale entries and preserves 
fresh ones
   
   ### Proposed approach
   Add an `evict_stale()` method to the `Cache` class in `atr/principal.py` 
that iterates over `last_refreshed` and removes entries where the elapsed time 
exceeds `2 * cache_for_at_most_seconds`. This method removes the corresponding 
keys from all three dictionaries (`last_refreshed`, `member_of`, 
`participant_of`). Then, in `atr/cache.py`'s `admins_refresh_loop()`, import 
and call this eviction method on each iteration. This follows @alitheg's 
recommendation to use `admins_refresh_loop` since it runs in the main process 
and can access the in-memory cache directly. The eviction runs every ~3631 
seconds (the admin poll interval), which is well beyond the 1200-second 
staleness threshold, ensuring inactive users are cleaned up within a reasonable 
window.
   
   ### Suggested patches
   
   #### `atr/principal.py`
   Add evict_stale() method to Cache class that removes entries not refreshed 
within 2x TTL
   
   ````diff
   --- a/atr/principal.py
   +++ b/atr/principal.py
   @@ -170,6 +170,7 @@
    class Cache:
        def __init__(self, cache_for_at_most_seconds: int = 600):
            self.cache_for_at_most_seconds = cache_for_at_most_seconds
   +        self.eviction_threshold_seconds = 2 * cache_for_at_most_seconds
            self.last_refreshed: dict[str, int | None] = {}
            self.member_of: dict[str, frozenset[str]] = {}
            self.participant_of: dict[str, frozenset[str]] = {}
   @@ -181,6 +182,19 @@
            since_last_refresh = now - last_refreshed
            return since_last_refresh > self.cache_for_at_most_seconds
    
   +    def evict_stale(self) -> int:
   +        """Remove entries not refreshed within 2x TTL. Returns number of 
evicted entries."""
   +        now = int(time.time())
   +        stale_uids: list[str] = []
   +        for asf_uid, last_refreshed in self.last_refreshed.items():
   +            if last_refreshed is None or (now - last_refreshed) > 
self.eviction_threshold_seconds:
   +                stale_uids.append(asf_uid)
   +        for asf_uid in stale_uids:
   +            del self.last_refreshed[asf_uid]
   +            self.member_of.pop(asf_uid, None)
   +            self.participant_of.pop(asf_uid, None)
   +        return len(stale_uids)
   +
    
    cache = Cache()
   ````
   
   #### `atr/cache.py`
   Call principal cache eviction from admins_refresh_loop as recommended by 
@alitheg
   
   ````diff
   --- a/atr/cache.py
   +++ b/atr/cache.py
   @@ -30,6 +30,7 @@
    import atr.config as config
    import atr.ldap as ldap
    import atr.log as log
   +import atr.principal as principal
    import atr.models.safe as safe
    import atr.models.schema as schema
    
   @@ -63,6 +64,11 @@
    async def admins_refresh_loop() -> None:
        while True:
            await asyncio.sleep(ADMINS_POLL_INTERVAL_SECONDS)
   +        try:
   +            evicted = principal.cache.evict_stale()
   +            if evicted > 0:
   +                log.info(f"Principal authorization cache: evicted {evicted} 
stale entries")
   +        except Exception as e:
   +            log.warning(f"Principal cache eviction failed: {e}")
            try:
                users = await ldap.fetch_admin_users()
                await admins_save_to_file(users)
   ````
   
   #### `tests/unit/test_principal_cache.py`
   Unit tests verifying eviction logic removes stale entries and preserves 
fresh ones
   
   ````diff
   --- /dev/null
   +++ b/tests/unit/test_principal_cache.py
   @@ -0,0 +1,67 @@
   +# Licensed to the Apache Software Foundation (ASF) under one
   +# or more contributor license agreements.  See the NOTICE file
   +# distributed with this work for additional information
   +# regarding copyright ownership.  The ASF licenses this file
   +# to you under the Apache License, Version 2.0 (the
   +# "License"); you may not use this file except in compliance
   +# with the License.  You may obtain a copy of the License at
   +#
   +#   http://www.apache.org/licenses/LICENSE-2.0
   +#
   +# Unless required by applicable law or agreed to in writing,
   +# software distributed under the License is distributed on an
   +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   +# KIND, either express or implied.  See the License for the
   +# specific language governing permissions and limitations
   +# under the License.
   +
   +import time
   +
   +from atr.principal import Cache
   +
   +
   +def test_evict_stale_removes_old_entries():
   +    cache = Cache(cache_for_at_most_seconds=600)
   +    now = int(time.time())
   +
   +    # Add a stale entry (older than 2x TTL = 1200 seconds)
   +    cache.last_refreshed["stale_user"] = now - 1500
   +    cache.member_of["stale_user"] = frozenset({"proj-a"})
   +    cache.participant_of["stale_user"] = frozenset({"proj-a"})
   +
   +    evicted = cache.evict_stale()
   +
   +    assert evicted == 1
   +    assert "stale_user" not in cache.last_refreshed
   +    assert "stale_user" not in cache.member_of
   +    assert "stale_user" not in cache.participant_of
   +
   +
   +def test_evict_stale_preserves_fresh_entries():
   +    cache = Cache(cache_for_at_most_seconds=600)
   +    now = int(time.time())
   +
   +    # Add a fresh entry (within 2x TTL)
   +    cache.last_refreshed["fresh_user"] = now - 100
   +    cache.member_of["fresh_user"] = frozenset({"proj-b"})
   +    cache.participant_of["fresh_user"] = frozenset({"proj-b"})
   +
   +    evicted = cache.evict_stale()
   +
   +    assert evicted == 0
   +    assert "fresh_user" in cache.last_refreshed
   +    assert "fresh_user" in cache.member_of
   +    assert "fresh_user" in cache.participant_of
   +
   +
   +def test_evict_stale_mixed_entries():
   +    cache = Cache(cache_for_at_most_seconds=600)
   +    now = int(time.time())
   +
   +    cache.last_refreshed["stale"] = now - 1500
   +    cache.member_of["stale"] = frozenset({"proj-a"})
   +    cache.participant_of["stale"] = frozenset({"proj-a"})
   +
   +    cache.last_refreshed["fresh"] = now - 100
   +    cache.member_of["fresh"] = frozenset({"proj-b"})
   +    cache.participant_of["fresh"] = frozenset({"proj-b"})
   +
   +    evicted = cache.evict_stale()
   +
   +    assert evicted == 1
   +    assert "stale" not in cache.last_refreshed
   +    assert "fresh" in cache.last_refreshed
   +
   +
   +def test_evict_stale_removes_none_last_refreshed():
   +    cache = Cache(cache_for_at_most_seconds=600)
   +    cache.last_refreshed["none_user"] = None
   +    cache.member_of["none_user"] = frozenset()
   +    cache.participant_of["none_user"] = frozenset()
   +
   +    evicted = cache.evict_stale()
   +
   +    assert evicted == 1
   +    assert "none_user" not in cache.last_refreshed
   ````
   
   ### Open questions
   - Should there be a circular import concern between atr/cache.py importing 
atr/principal.py? Need to verify the existing import graph doesn't create a 
cycle.
   - Should the eviction threshold be configurable, or is 2x TTL (1200 seconds) 
always appropriate?
   - Should we log at DEBUG level instead of INFO for the eviction message to 
avoid log noise when frequently evicting small numbers of entries?
   
   ### Files examined
   - `atr/principal.py`
   - `tests/unit/test_cache.py`
   - `atr/cache.py`
   - `atr/ldap.py`
   - `atr/db/__init__.py`
   - `atr/models/sql.py`
   - `tests/unit/test_ldap.py`
   - `atr/sessions.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to