asf-tooling commented on issue #618:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/618#issuecomment-4410133984

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `refactor`  •  **Classification:** `actionable`  •  
**Confidence:** `medium`
   **Application domain(s):** `authentication_authorization`, 
`shared_infrastructure`
   
   ### Summary
   Issue #618 asks to improve session test data to use realistic workflows 
(from ASFQuart/LDAP) instead of synthesized data, and to address the 
`read`/`write` asymmetry where ASFQuart's `write()` takes `pmcs` but `read()` 
returns `committees`. @alitheg noted an open PR (infrastructure-asfquart#49) 
that would remove the need for the before_request re-write. @dave2wave 
suggested treating `committees` as an alias for `pmcs` in 
`_prepare_session_data` and erroring if both are present. The code in 
`_prepare_session_data` already maps `pmcs`→`committees`, but doesn't accept 
`committees` directly as an alias, and the test helper `_session_data()` uses 
minimal synthetic fields.
   
   ### Where this lives in the code today
   
   #### `atr/sessions.py` — `_prepare_session_data` (lines 112-133)
   _needs modification_
   This function maps raw OAuth dict keys to internal UserSession fields but 
doesn't handle the case where `committees` is passed directly (the alias 
@dave2wave suggested), nor does it error when both `pmcs` and `committees` are 
present.
   
   ```python
   def _prepare_session_data(session_data: dict[str, Any]) -> dict[str, Any]:
       # Field names match the raw OAuth dict from asfquart
       # They do not match ClientSession attribute names
       # ClientSession remaps on construction
       # Raw "pmcs" becomes ClientSession.committees
       # Raw "roleaccount" becomes ClientSession.isRole
       # Since awrite() passes the raw dict, we map from the raw key names here
       data = dict(session_data)
       if "pmcs" in data:
           data["committees"] = data.pop("pmcs")
       if "isMember" in data:
           data["is_member"] = data.pop("isMember")
       if "isChair" in data:
           data["is_chair"] = data.pop("isChair")
       # if "isRoot" in data:
       #     data["is_root"] = data.pop("isRoot")
       if "roleaccount" in data:
           data["is_role"] = data.pop("roleaccount")
       data.pop("metadata", None)
       data.pop("cts", None)
       data.pop("uts", None)
       return {k: v for k, v in data.items() if k in _SESSION_DATA_FIELDS}
   ```
   
   #### `tests/unit/test_sessions.py` — `_session_data` (lines 266-269)
   _needs modification_
   The test helper uses minimal synthesized session data that doesn't reflect 
realistic ASFQuart/LDAP output (missing fields like fullname, email, dn, 
isMember, isChair, mfa).
   
   ```python
   def _session_data(uid: str = "alice", **kwargs) -> dict:
       base = {"uid": uid, "pmcs": ["test"], "projects": ["test"]}
       base.update(kwargs)
       return base
   ```
   
   #### `tests/unit/test_sessions.py` — `test_create_maps_pmcs_to_committees` 
(lines 55-60)
   _currently does this_
   Tests the pmcs→committees mapping but only with minimal synthetic data; 
doesn't test the alias case or error on both fields present.
   
   ```python
   async def test_create_maps_pmcs_to_committees(store):
       hsid = _hsid()
       await store.create(hsid, _session_data())
       user_session = await store.validate(hsid)
       assert user_session is not None
       assert user_session.committees == ["test"]
   ```
   
   #### `atr/principal.py` — `AuthoriserLDAP.cache_refresh` (lines 277-302)
   _currently does this_
   Shows how LDAP authoriser already has a Debug-mode path reading from a 
session cache file, and test mode with hardcoded data - a potential extension 
point for more realistic test data.
   
   ```python
       async def cache_refresh(self, asf_uid: str) -> None:
           if not self.__cache.outdated(asf_uid):
               return
   
           if config.is_test_mode() and (asf_uid == "test"):
               # The test user does not exist in LDAP, so we hardcode their data
               committees = frozenset({"test"})
               projects = frozenset({"test"})
               self.__cache.member_of[asf_uid] = committees
               self.__cache.participant_of[asf_uid] = projects
               self.__cache.last_refreshed[asf_uid] = int(time.time())
               return
   
           if config.get_mode() == config.Mode.Debug:
               session_cache = await util.session_cache_read()
               if asf_uid in session_cache:
                   cached_session = session_cache[asf_uid]
                   committees = frozenset(cached_session.get("pmcs", []))
                   projects = frozenset(cached_session.get("projects", []))
                   committees, projects = _augment_test_membership(committees, 
projects)
   
                   self.__cache.member_of[asf_uid] = committees
                   self.__cache.participant_of[asf_uid] = projects
                   self.__cache.last_refreshed[asf_uid] = int(time.time())
                   log.info(f"Loaded session data for {asf_uid} from session 
cache file")
                   return
   ```
   
   #### `typestubs/asfquart/session.pyi` — `ClientSession` (lines 9-26)
   _currently does this_
   Shows the read-side ClientSession type with `committees` field, which is 
what ASFQuart returns after the internal remapping from raw `pmcs`.
   
   ```python
   class ClientSession(dict):
       uid: str | None
       dn: str | None
       fullname: str | None
       email: str
       isMember: bool
       isChair: bool
       isRoot: bool
       committees: list[str]
       projects: list[str]
       mfa: bool
       isRole: bool
       metadata: dict
   
       def __init__(self, raw_data: dict) -> None:
           """Initializes a client session from a raw dict. ClientSession is a 
subclassed dict, so that
           we can send it to quart in a format it can render."""
           ...
   ```
   
   ### Where new code would go
   - `tests/unit/test_sessions.py` — after symbol _session_data
     Add a more realistic session data fixture that mirrors actual ASFQuart 
OAuth output, including all fields like fullname, dn, email, isMember, isChair, 
mfa
   
   ### Proposed approach
   The fix has two parts. First, make `_prepare_session_data` accept 
`committees` as an alias for `pmcs` (as @dave2wave suggested), raising a 
`ValueError` if both are present simultaneously. This handles the read/write 
asymmetry at ATR's boundary without requiring upstream ASFQuart changes.
   
   Second, enrich the test helper `_session_data()` to include realistic fields 
that ASFQuart's OAuth flow actually provides (fullname, dn, email, isMember, 
isChair, mfa, projects), and add test cases that exercise both the `pmcs` path 
and the `committees` alias path. This ensures tests catch issues like #617 
where synthesized data diverges from real session data. The external 
infrastructure-asfquart#49 PR may eventually remove the asymmetry at source, 
but ATR should be defensive regardless.
   
   ### Suggested patches
   
   #### `atr/sessions.py`
   Accept `committees` as alias for `pmcs` and error if both present, per 
@dave2wave's suggestion
   
   ````diff
   --- a/atr/sessions.py
   +++ b/atr/sessions.py
   @@ -100,10 +100,16 @@ def _prepare_session_data(session_data: dict[str, 
Any]) -> dict[str, Any]:
        # Raw "pmcs" becomes ClientSession.committees
        # Raw "roleaccount" becomes ClientSession.isRole
        # Since awrite() passes the raw dict, we map from the raw key names here
        data = dict(session_data)
   +    if "pmcs" in data and "committees" in data:
   +        raise ValueError("Session data must not contain both 'pmcs' and 
'committees'")
        if "pmcs" in data:
            data["committees"] = data.pop("pmcs")
   +    # Accept 'committees' directly as an alias (e.g. from ClientSession 
read output)
   +    # No remapping needed since it already matches the UserSession field 
name
        if "isMember" in data:
            data["is_member"] = data.pop("isMember")
   +    if "is_member" not in data and "isMember" not in session_data:
   +        pass  # Field not present, will use model default
        if "isChair" in data:
            data["is_chair"] = data.pop("isChair")
        # if "isRoot" in data:
   ````
   
   #### `tests/unit/test_sessions.py`
   Add a realistic session data helper and tests for the committees alias and 
conflict detection
   
   ````diff
   --- a/tests/unit/test_sessions.py
   +++ b/tests/unit/test_sessions.py
   @@ -230,6 +230,51 @@ def _hsid() -> str:
        return f"test-hash-{_COUNTER}"
    
    
    def _session_data(uid: str = "alice", **kwargs) -> dict:
        base = {"uid": uid, "pmcs": ["test"], "projects": ["test"]}
        base.update(kwargs)
        return base
   +
   +
   +def _realistic_session_data(uid: str = "alice", **kwargs) -> dict:
   +    """Session data that mirrors actual ASFQuart OAuth callback output."""
   +    base = {
   +        "uid": uid,
   +        "dn": f"uid={uid},ou=people,dc=apache,dc=org",
   +        "fullname": "Alice Example",
   +        "email": f"{uid}@apache.org",
   +        "pmcs": ["test"],
   +        "projects": ["test"],
   +        "isMember": False,
   +        "isChair": False,
   +        "roleaccount": False,
   +        "mfa": True,
   +    }
   +    base.update(kwargs)
   +    return base
   +
   +
   +async def test_create_with_realistic_session_data(store):
   +    hsid = _hsid()
   +    await store.create(hsid, _realistic_session_data())
   +    user_session = await store.validate(hsid)
   +    assert user_session is not None
   +    assert user_session.uid == "alice"
   +    assert user_session.committees == ["test"]
   +    assert user_session.projects == ["test"]
   +    assert user_session.is_member is False
   +    assert user_session.is_chair is False
   +    assert user_session.is_role is False
   +
   +
   +async def test_create_accepts_committees_as_alias(store):
   +    """committees can be passed directly (e.g. from ClientSession read 
output)."""
   +    hsid = _hsid()
   +    await store.create(hsid, {"uid": "bob", "committees": ["infra"], 
"projects": ["infra"]})
   +    user_session = await store.validate(hsid)
   +    assert user_session is not None
   +    assert user_session.committees == ["infra"]
   +
   +
   +async def test_create_rejects_both_pmcs_and_committees(store):
   +    """Having both pmcs and committees in session data is ambiguous and 
rejected."""
   +    hsid = _hsid()
   +    with pytest.raises(ValueError, match="must not contain both"):
   +        await store.create(hsid, {"uid": "eve", "pmcs": ["a"], 
"committees": ["b"]})
   ````
   
   ### Open questions
   - What is the status of infrastructure-asfquart PR #49 mentioned by 
@alitheg? If merged, it may change the upstream behavior enough to require 
adjusting this approach.
   - Should a test service account be created in LDAP for more realistic 
integration testing, or is enriching the synthetic data sufficient?
   - Does `sql.UserSession` have fields for all the realistic session 
attributes (fullname, dn, mfa)? The model fields need to be confirmed to ensure 
_realistic_session_data doesn't pass fields that get silently dropped by 
`_SESSION_DATA_FIELDS` filtering.
   - The issue mentions solving the read/write asymmetry 'in ASFQuart' - should 
the alias handling live in ATR's _prepare_session_data (as proposed) or should 
it be pushed upstream to ASFQuart's write() function?
   
   ### Files examined
   - `atr/sessions.py`
   - `atr/docs/asfquart-usage.md`
   - `atr/docs/authentication-security.md`
   - `atr/principal.py`
   - `tests/unit/test_sessions.py`
   - `typestubs/asfquart/session.pyi`
   - `atr/docs/authorization-security.md`
   - `atr/ldap.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to