asf-tooling commented on issue #618:
URL:
https://github.com/apache/tooling-trusted-releases/issues/618#issuecomment-4410133984
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `refactor` • **Classification:** `actionable` •
**Confidence:** `medium`
**Application domain(s):** `authentication_authorization`,
`shared_infrastructure`
### Summary
Issue #618 asks to improve session test data to use realistic workflows
(from ASFQuart/LDAP) instead of synthesized data, and to address the
`read`/`write` asymmetry where ASFQuart's `write()` takes `pmcs` but `read()`
returns `committees`. @alitheg noted an open PR (infrastructure-asfquart#49)
that would remove the need for the before_request re-write. @dave2wave
suggested treating `committees` as an alias for `pmcs` in
`_prepare_session_data` and erroring if both are present. The code in
`_prepare_session_data` already maps `pmcs`→`committees`, but doesn't accept
`committees` directly as an alias, and the test helper `_session_data()` uses
minimal synthetic fields.
### Where this lives in the code today
#### `atr/sessions.py` — `_prepare_session_data` (lines 112-133)
_needs modification_
This function maps raw OAuth dict keys to internal UserSession fields but
doesn't handle the case where `committees` is passed directly (the alias
@dave2wave suggested), nor does it error when both `pmcs` and `committees` are
present.
```python
def _prepare_session_data(session_data: dict[str, Any]) -> dict[str, Any]:
# Field names match the raw OAuth dict from asfquart
# They do not match ClientSession attribute names
# ClientSession remaps on construction
# Raw "pmcs" becomes ClientSession.committees
# Raw "roleaccount" becomes ClientSession.isRole
# Since awrite() passes the raw dict, we map from the raw key names here
data = dict(session_data)
if "pmcs" in data:
data["committees"] = data.pop("pmcs")
if "isMember" in data:
data["is_member"] = data.pop("isMember")
if "isChair" in data:
data["is_chair"] = data.pop("isChair")
# if "isRoot" in data:
# data["is_root"] = data.pop("isRoot")
if "roleaccount" in data:
data["is_role"] = data.pop("roleaccount")
data.pop("metadata", None)
data.pop("cts", None)
data.pop("uts", None)
return {k: v for k, v in data.items() if k in _SESSION_DATA_FIELDS}
```
#### `tests/unit/test_sessions.py` — `_session_data` (lines 266-269)
_needs modification_
The test helper uses minimal synthesized session data that doesn't reflect
realistic ASFQuart/LDAP output (missing fields like fullname, email, dn,
isMember, isChair, mfa).
```python
def _session_data(uid: str = "alice", **kwargs) -> dict:
base = {"uid": uid, "pmcs": ["test"], "projects": ["test"]}
base.update(kwargs)
return base
```
#### `tests/unit/test_sessions.py` — `test_create_maps_pmcs_to_committees`
(lines 55-60)
_currently does this_
Tests the pmcs→committees mapping but only with minimal synthetic data;
doesn't test the alias case or error on both fields present.
```python
async def test_create_maps_pmcs_to_committees(store):
hsid = _hsid()
await store.create(hsid, _session_data())
user_session = await store.validate(hsid)
assert user_session is not None
assert user_session.committees == ["test"]
```
#### `atr/principal.py` — `AuthoriserLDAP.cache_refresh` (lines 277-302)
_currently does this_
Shows how LDAP authoriser already has a Debug-mode path reading from a
session cache file, and test mode with hardcoded data - a potential extension
point for more realistic test data.
```python
async def cache_refresh(self, asf_uid: str) -> None:
if not self.__cache.outdated(asf_uid):
return
if config.is_test_mode() and (asf_uid == "test"):
# The test user does not exist in LDAP, so we hardcode their data
committees = frozenset({"test"})
projects = frozenset({"test"})
self.__cache.member_of[asf_uid] = committees
self.__cache.participant_of[asf_uid] = projects
self.__cache.last_refreshed[asf_uid] = int(time.time())
return
if config.get_mode() == config.Mode.Debug:
session_cache = await util.session_cache_read()
if asf_uid in session_cache:
cached_session = session_cache[asf_uid]
committees = frozenset(cached_session.get("pmcs", []))
projects = frozenset(cached_session.get("projects", []))
committees, projects = _augment_test_membership(committees,
projects)
self.__cache.member_of[asf_uid] = committees
self.__cache.participant_of[asf_uid] = projects
self.__cache.last_refreshed[asf_uid] = int(time.time())
log.info(f"Loaded session data for {asf_uid} from session
cache file")
return
```
#### `typestubs/asfquart/session.pyi` — `ClientSession` (lines 9-26)
_currently does this_
Shows the read-side ClientSession type with `committees` field, which is
what ASFQuart returns after the internal remapping from raw `pmcs`.
```python
class ClientSession(dict):
uid: str | None
dn: str | None
fullname: str | None
email: str
isMember: bool
isChair: bool
isRoot: bool
committees: list[str]
projects: list[str]
mfa: bool
isRole: bool
metadata: dict
def __init__(self, raw_data: dict) -> None:
"""Initializes a client session from a raw dict. ClientSession is a
subclassed dict, so that
we can send it to quart in a format it can render."""
...
```
### Where new code would go
- `tests/unit/test_sessions.py` — after symbol _session_data
Add a more realistic session data fixture that mirrors actual ASFQuart
OAuth output, including all fields like fullname, dn, email, isMember, isChair,
mfa
### Proposed approach
The fix has two parts. First, make `_prepare_session_data` accept
`committees` as an alias for `pmcs` (as @dave2wave suggested), raising a
`ValueError` if both are present simultaneously. This handles the read/write
asymmetry at ATR's boundary without requiring upstream ASFQuart changes.
Second, enrich the test helper `_session_data()` to include realistic fields
that ASFQuart's OAuth flow actually provides (fullname, dn, email, isMember,
isChair, mfa, projects), and add test cases that exercise both the `pmcs` path
and the `committees` alias path. This ensures tests catch issues like #617
where synthesized data diverges from real session data. The external
infrastructure-asfquart#49 PR may eventually remove the asymmetry at source,
but ATR should be defensive regardless.
### Suggested patches
#### `atr/sessions.py`
Accept `committees` as alias for `pmcs` and error if both present, per
@dave2wave's suggestion
````diff
--- a/atr/sessions.py
+++ b/atr/sessions.py
@@ -100,10 +100,16 @@ def _prepare_session_data(session_data: dict[str,
Any]) -> dict[str, Any]:
# Raw "pmcs" becomes ClientSession.committees
# Raw "roleaccount" becomes ClientSession.isRole
# Since awrite() passes the raw dict, we map from the raw key names here
data = dict(session_data)
+ if "pmcs" in data and "committees" in data:
+ raise ValueError("Session data must not contain both 'pmcs' and
'committees'")
if "pmcs" in data:
data["committees"] = data.pop("pmcs")
+ # Accept 'committees' directly as an alias (e.g. from ClientSession
read output)
+ # No remapping needed since it already matches the UserSession field
name
if "isMember" in data:
data["is_member"] = data.pop("isMember")
+ if "is_member" not in data and "isMember" not in session_data:
+ pass # Field not present, will use model default
if "isChair" in data:
data["is_chair"] = data.pop("isChair")
# if "isRoot" in data:
````
#### `tests/unit/test_sessions.py`
Add a realistic session data helper and tests for the committees alias and
conflict detection
````diff
--- a/tests/unit/test_sessions.py
+++ b/tests/unit/test_sessions.py
@@ -230,6 +230,51 @@ def _hsid() -> str:
return f"test-hash-{_COUNTER}"
def _session_data(uid: str = "alice", **kwargs) -> dict:
base = {"uid": uid, "pmcs": ["test"], "projects": ["test"]}
base.update(kwargs)
return base
+
+
+def _realistic_session_data(uid: str = "alice", **kwargs) -> dict:
+ """Session data that mirrors actual ASFQuart OAuth callback output."""
+ base = {
+ "uid": uid,
+ "dn": f"uid={uid},ou=people,dc=apache,dc=org",
+ "fullname": "Alice Example",
+ "email": f"{uid}@apache.org",
+ "pmcs": ["test"],
+ "projects": ["test"],
+ "isMember": False,
+ "isChair": False,
+ "roleaccount": False,
+ "mfa": True,
+ }
+ base.update(kwargs)
+ return base
+
+
+async def test_create_with_realistic_session_data(store):
+ hsid = _hsid()
+ await store.create(hsid, _realistic_session_data())
+ user_session = await store.validate(hsid)
+ assert user_session is not None
+ assert user_session.uid == "alice"
+ assert user_session.committees == ["test"]
+ assert user_session.projects == ["test"]
+ assert user_session.is_member is False
+ assert user_session.is_chair is False
+ assert user_session.is_role is False
+
+
+async def test_create_accepts_committees_as_alias(store):
+ """committees can be passed directly (e.g. from ClientSession read
output)."""
+ hsid = _hsid()
+ await store.create(hsid, {"uid": "bob", "committees": ["infra"],
"projects": ["infra"]})
+ user_session = await store.validate(hsid)
+ assert user_session is not None
+ assert user_session.committees == ["infra"]
+
+
+async def test_create_rejects_both_pmcs_and_committees(store):
+ """Having both pmcs and committees in session data is ambiguous and
rejected."""
+ hsid = _hsid()
+ with pytest.raises(ValueError, match="must not contain both"):
+ await store.create(hsid, {"uid": "eve", "pmcs": ["a"],
"committees": ["b"]})
````
### Open questions
- What is the status of infrastructure-asfquart PR #49 mentioned by
@alitheg? If merged, it may change the upstream behavior enough to require
adjusting this approach.
- Should a test service account be created in LDAP for more realistic
integration testing, or is enriching the synthetic data sufficient?
- Does `sql.UserSession` have fields for all the realistic session
attributes (fullname, dn, mfa)? The model fields need to be confirmed to ensure
_realistic_session_data doesn't pass fields that get silently dropped by
`_SESSION_DATA_FIELDS` filtering.
- The issue mentions solving the read/write asymmetry 'in ASFQuart' - should
the alias handling live in ATR's _prepare_session_data (as proposed) or should
it be pushed upstream to ASFQuart's write() function?
### Files examined
- `atr/sessions.py`
- `atr/docs/asfquart-usage.md`
- `atr/docs/authentication-security.md`
- `atr/principal.py`
- `tests/unit/test_sessions.py`
- `typestubs/asfquart/session.pyi`
- `atr/docs/authorization-security.md`
- `atr/ldap.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]