asf-tooling commented on issue #1075:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/1075#issuecomment-4409800447

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@2da7807a`
   
   **Type:** `bug_fix`  •  **Classification:** `actionable`  •  **Confidence:** 
`medium`
   **Application domain(s):** `cryptographic_keys`
   
   ### Summary
   The issue reports that bulk PGP key block processing in 
`atr/storage/writers/keys.py` has no limit on the number of key blocks 
processed per request, allowing CPU exhaustion. The specific function 
`add_bulk_public_keys` cited in the issue does NOT exist in the codebase; the 
actual bulk processing occurs through `CommitteeParticipant.__ensure()` (called 
by `ensure_associated` and `ensure_stored`). There IS a file size limit 
(`_MAX_KEYS_SIZE = 10MB`) but no count limit on individual key blocks within 
that payload. @dave2wave noted that the limit should be twice the current 
maximum of any existing KEYS file rather than an arbitrary 100.
   
   ### Where this lives in the code today
   
   #### `atr/storage/writers/keys.py` — 
`CommitteeParticipant.ensure_associated` (lines 671-681)
   _needs modification_
   Entry point for bulk key processing that delegates to __ensure() without any 
count limit
   
   ```python
       async def ensure_associated(
           self, keys_file_text: str, ldap_data: dict[str, str] | None = None
       ) -> outcome.List[types.Key]:
           outcomes: outcome.List[types.Key] = await self.__ensure(
               keys_file_text,
               associate=True,
               ldap_data=ldap_data,
           )
           if outcomes.any_result:
               await self.autogenerate_keys_file()
           return outcomes
   ```
   
   #### `atr/storage/writers/keys.py` — `CommitteeParticipant.ensure_stored` 
(lines 683-693)
   _needs modification_
   Another entry point for bulk key processing without count limit
   
   ```python
       async def ensure_stored(
           self, keys_file_text: str, ldap_data: dict[str, str] | None = None
       ) -> outcome.List[types.Key]:
           outcomes: outcome.List[types.Key] = await self.__ensure(
               keys_file_text,
               associate=False,
               ldap_data=ldap_data,
           )
           if outcomes.any_result:
               await self.autogenerate_keys_file()
           return outcomes
   ```
   
   #### `atr/storage/writers/keys.py` — `FoundationCommitter.__ensure_one` 
(lines 491-505)
   _currently does this_
   The single-key path enforces exactly one block - shows the pattern for 
parsing key blocks via util.parse_key_blocks()
   
   ```python
       async def __ensure_one(self, key_file_text: str, associate: bool = True) 
-> outcome.Outcome[types.Key]:
           try:
               key_blocks = util.parse_key_blocks(key_file_text)
           except Exception as e:
               return outcome.Error(e)
           if len(key_blocks) != 1:
               return outcome.Error(ValueError("Expected one key block, got 
none or multiple"))
           key_block = key_blocks[0]
           try:
               ldap_data = await util.email_to_uid_map()
               key = await asyncio.to_thread(self.__block_model, key_block, 
ldap_data)
           except Exception as e:
               return outcome.Error(e)
           oc = await self.__database_add_model(key)
           return oc
   ```
   
   #### `atr/post/keys.py` — `_process_keys` (lines 369-391)
   _currently does this_
   Web handler that passes keys text to ensure_associated without any block 
count validation
   
   ```python
   async def _process_keys(keys_text: str, selected_committee: str) -> str:
       """Process keys text and associate with committee."""
       if util.contains_private_key_text(keys_text):
           del keys_text
           gc.collect()
           await quart.flash(util.PRIVATE_KEY_UPLOAD_WARNING, "error")
           return await shared.keys.render_upload_page(error=True)
   
       async with storage.write() as write:
           wacp = write.as_committee_participant(selected_committee)
           outcomes = await wacp.keys.ensure_associated(keys_text)
   
       success_count = outcomes.result_count
       error_count = outcomes.error_count
       total_count = success_count + error_count
   
       message = f"Processed {util.plural(total_count, 'key')}: {success_count} 
successful"
       if error_count > 0:
           message += f", {error_count} failed"
   
       await quart.flash(message, "success" if (success_count > 0) else "error")
   
       return await shared.keys.render_upload_page(results=outcomes, 
submitted_committees=[selected_committee])
   ```
   
   ### Where new code would go
   - `atr/storage/writers/keys.py` — after _RSA_MINIMUM_BITS constant definition
     Add _MAX_KEY_BLOCKS_PER_REQUEST constant alongside other module-level 
limits
   
   ### Proposed approach
   The fix should add a `_MAX_KEY_BLOCKS_PER_REQUEST` constant in 
`atr/storage/writers/keys.py` and enforce it in the 
`CommitteeParticipant.__ensure()` private method, which is the shared 
implementation for `ensure_associated` and `ensure_stored`. Since `__ensure()` 
likely calls `util.parse_key_blocks()` (following the same pattern as 
`__ensure_one`), the limit check should be applied immediately after parsing 
key blocks, before iterating over them for CPU-intensive PGP parsing. Per 
@dave2wave's guidance, the value should be determined by checking the maximum 
number of keys in any existing KEYS file and setting the limit to twice that. 
The comment in `_MAX_KEYS_SIZE` notes that Apache Subversion's KEYS file is the 
largest at ~3.7MB, which at typical PGP key sizes (~3-10KB per key) could 
contain 370-1200 keys. A conservative limit of 500 or checking actual data 
would be appropriate. Since the `__ensure()` implementation is in the truncated 
portion of the file, the diff targets the con
 stant definition and the visible entry points, with a note that the actual 
enforcement belongs in `__ensure()`.
   
   ### Suggested patches
   
   #### `atr/storage/writers/keys.py`
   Add the maximum key blocks constant near other module-level constants
   
   ````diff
   --- a/atr/storage/writers/keys.py
   +++ b/atr/storage/writers/keys.py
   @@ -83,6 +83,10 @@ _EC_MINIMUM_BITS: Final = 255
    _RSA_ALGORITHMS: Final = frozenset({1, 3})
    _RSA_MINIMUM_BITS: Final = 4096
    
   +# TODO: Confirm this is >= 2x the maximum key count in any existing KEYS 
file
   +# (per dave2wave's guidance). Apache Subversion has the largest KEYS file.
   +_MAX_KEY_BLOCKS_PER_REQUEST: Final = 500
   +
    
    def _algorithm_id(name: str) -> int:
        if algorithm_id := _ALGORITHM_IDS.get(name):
   ````
   
   #### `atr/storage/writers/keys.py`
   Add key block count validation in ensure_associated before delegating to 
__ensure
   
   ````diff
   --- a/atr/storage/writers/keys.py
   +++ b/atr/storage/writers/keys.py
   @@ -428,6 +428,14 @@ class CommitteeParticipant(FoundationCommitter):
        async def ensure_associated(
            self, keys_file_text: str, ldap_data: dict[str, str] | None = None
        ) -> outcome.List[types.Key]:
   +        # Enforce maximum key block count to prevent CPU exhaustion
   +        try:
   +            key_blocks = util.parse_key_blocks(keys_file_text)
   +        except Exception as e:
   +            return outcome.List([outcome.Error(e)])
   +        if len(key_blocks) > _MAX_KEY_BLOCKS_PER_REQUEST:
   +            err = ValueError(f"Cannot process more than 
{_MAX_KEY_BLOCKS_PER_REQUEST} key blocks in a single request. Received 
{len(key_blocks)} blocks.")
   +            return outcome.List([outcome.Error(err)])
            outcomes: outcome.List[types.Key] = await self.__ensure(
                keys_file_text,
                associate=True,
   @@ -440,6 +448,14 @@ class CommitteeParticipant(FoundationCommitter):
        async def ensure_stored(
            self, keys_file_text: str, ldap_data: dict[str, str] | None = None
        ) -> outcome.List[types.Key]:
   +        # Enforce maximum key block count to prevent CPU exhaustion
   +        try:
   +            key_blocks = util.parse_key_blocks(keys_file_text)
   +        except Exception as e:
   +            return outcome.List([outcome.Error(e)])
   +        if len(key_blocks) > _MAX_KEY_BLOCKS_PER_REQUEST:
   +            err = ValueError(f"Cannot process more than 
{_MAX_KEY_BLOCKS_PER_REQUEST} key blocks in a single request. Received 
{len(key_blocks)} blocks.")
   +            return outcome.List([outcome.Error(err)])
            outcomes: outcome.List[types.Key] = await self.__ensure(
                keys_file_text,
                associate=False,
   ````
   
   ### Open questions
   - The `__ensure()` method implementation is in the truncated portion of the 
file - the limit check may be better placed inside `__ensure()` itself to avoid 
double-parsing key blocks (once for the limit check, once inside __ensure). 
Need to verify the actual implementation.
   - The exact value for `_MAX_KEY_BLOCKS_PER_REQUEST` should be determined by 
checking existing KEYS files as @dave2wave suggested. The Apache Subversion 
KEYS file is noted as the largest at ~3.7MB - need to count how many key blocks 
it contains and set the limit to 2x that.
   - The function `add_bulk_public_keys` referenced in the issue body at line 
388 does not exist in the codebase. The actual bulk processing is done via 
`CommitteeParticipant.__ensure()`. The issue may have been generated from a 
security scanning tool with imprecise code references.
   - The proposed diff may cause `util.parse_key_blocks()` to be called twice 
(once in the guard, once inside `__ensure`). A cleaner approach might be to 
pass pre-parsed key blocks to `__ensure()` or add the check only inside 
`__ensure()`, which I cannot see due to file truncation.
   - Need to verify that `outcome.List([outcome.Error(err)])` is the correct 
way to construct an error outcome list - the exact constructor signature for 
`outcome.List` is not visible in the provided source.
   
   ### Files examined
   - `atr/storage/writers/keys.py`
   - `atr/post/keys.py`
   - `atr/shared/keys.py`
   - `atr/tasks/keys.py`
   - `atr/get/keys.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to