andrewmusselman opened a new issue, #1256:
URL: https://github.com/apache/tooling-trusted-releases/issues/1256

   ### Summary
   
   `AddProjectForm` rejects display names for real, current Apache projects 
that contain `-` or `_` — including `Apache Empire-db` and every `Apache mod_*` 
project. The whitelist of irregular words is honoured by the per-word case 
check but not by the subsequent whole-string character check.
   
   ### Affected projects
   
   Real TLPs/podlings that cannot currently be added or renamed to their actual 
display name:
   
   - `Apache Empire-db`
   - `Apache mod_jk`, `Apache mod_perl`, `Apache mod_ftp`, `Apache mod_python`, 
and the rest of the `mod_*` family
   
   ### Repro
   
   From the repo root:
   
   ```bash
   uv run python -c "
   from atr.shared.projects import AddProjectForm
   AddProjectForm(
       csrf_token='',
       committee_key='empire-db',
       display_name='Apache Empire-db',
       key='empire-db',
   )
   "
   ```
   
   Produces:
   
   ```
   display_name
     Value error, Name must be alphanumeric and may include spaces or dots or 
plus signs.
     [type=value_error, input_value='Apache Empire-db', input_type=str]
   ```
   
   Same error for `Apache mod_jk`, `Apache mod_perl`, etc.
   
   ### Where the bug lives
   
   `atr/shared/projects.py`, in the display-name validation in 
`AddProjectForm.validate_fields`. Two checks run in sequence:
   
   1. **Per-word case check.** Each word after `Apache` must match 
`PascalCase`, `camelCase`, or `^mod(_[0-9a-z]+)+$`, or be in 
`allowed_irregular_words = {".NET", "C++", "Empire-db", "Lucene.NET", "for", 
"jclouds"}`. ✅ Respects the whitelist.
   2. **Whole-string character check.** ```display_name.replace(" ", 
"").replace(".", "").replace("+", "").isalnum()``` must be true. ❌ Does **not** 
respect the whitelist, and only strips spaces, dots, and plus signs. Hyphens 
and underscores are not stripped, so `Empire-db` and `mod_jk` fail.
   
   So `Apache Empire-db` clears check 1 (whitelist match) and dies on check 2 
(the `-` is not alphanumeric). `Apache mod_jk` clears check 1 via the 
`r_mod_case` regex and dies on check 2 on the `_`.
   
   ### Suggested fixes
   
   Two options, either works:
   
   **(a) Strip the extra characters too.** Smaller change, preserves the "every 
character must be one of these" invariant:
   
   ```python
   stripped = display_name.replace(" ", "").replace(".", "").replace("+", 
"").replace("-", "").replace("_", "")
   if not stripped.isalnum():
       raise ValueError("Name must be alphanumeric and may include spaces, 
dots, plus signs, hyphens, or underscores.")
   ```
   
   **(b) Skip check 2 for words that already passed check 1.** Cleaner 
conceptually — once a word is on the whitelist or matches a structural regex, 
it's by definition allowed:
   
   ```python
   for display_name_word in display_name_words[1:]:
       if display_name_word in allowed_irregular_words:
           continue
       if r_pascal_case.match(display_name_word) or 
r_camel_case.match(display_name_word) or r_mod_case.match(display_name_word):
           continue
       raise ValueError("Display name words must be in PascalCase, camelCase, 
or mod_ case.")
   # drop the .isalnum() check entirely
   ```
   
   I'd lean toward (b) since check 2 is already partially redundant with check 
1 — if every word individually passed a structural check, the whole string is 
well-formed by construction. (a) is the safer change if you want to keep the 
belt-and-braces structure.
   
   Either way, please add `Apache Empire-db` and `Apache mod_jk` to the test 
cases so this doesn't regress:
   
   ```python
   def test_empire_db_is_accepted():
       AddProjectForm(csrf_token='', committee_key='empire-db', 
display_name='Apache Empire-db', key='empire-db')
   
   def test_mod_jk_is_accepted():
       AddProjectForm(csrf_token='', committee_key='httpd', 
display_name='Apache mod_jk', key='httpd-mod_jk')
   ```
   
   ### Found while doing
   
   #1254.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to