[I] Limit regex complexity in user-supplied ignore patterns (tooling-trusted-releases)

via GitHub Tue, 27 Jan 2026 13:28:19 -0800


andrewmusselman opened a new issue, #599:
URL: https://github.com/apache/tooling-trusted-releases/issues/599


   ## Summary
   
   User-supplied ignore patterns are compiled as regular expressions without 
any complexity limits. A carefully crafted pattern can cause catastrophic 
backtracking (ReDoS), consuming CPU and causing denial of service.
   
   ## ASVS Requirements
   
   - 2.1.1 - Verify that input validation protects against DoS
   - 2.2.1 - Verify that input is validated against expected patterns
   
   ## Related Audit Reports
   
   - [2.2.1.md](ASVS/reports/44ee502/L1/2.2.1.md) - ReDoS findings
   - [2.1.1.md](ASVS/reports/44ee502/L1/2.1.1.md) - Input validation findings
   
   ## Affected Files
   
   - `atr/storage/readers/checks.py` - Ignore pattern processing
   
   ## Current Behavior
   
   User-supplied patterns are compiled directly with `re.compile()` without:
   - Timeout limits on matching
   - Pattern complexity validation
   - Alternative safe matching algorithms
   
   ## Risk
   
   - Denial of service through CPU exhaustion
   - Worker process starvation
   - Release pipeline disruption
   - Resource exhaustion affecting other users
   
   # ReDoS Exploit Example: Ignore Pattern Vulnerability
   
   ## The Attack
   
   ### Malicious Ignore Pattern
   
   ```regex
   ^(a+)+$
   ```
   
   ### Input String That Triggers Catastrophic Backtracking
   
   ```
   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX
   ```
   
   ---
   
   ## Why It's Slow
   
   The regex engine tries every possible way to divide the `a`s between the 
inner `a+` and outer `()+` before finally failing at the `X`. 
   
   With N characters, this is **O(2^N)** complexity - exponential time.
   
   For example, with the string `aaaX`, the engine tries:
   - `(aaa)` → fails at X
   - `(aa)(a)` → fails at X  
   - `(a)(aa)` → fails at X
   - `(a)(a)(a)` → fails at X
   
   With 30 `a`s, there are over **1 billion** combinations to try.
   
   ---
   
   ## Proof of Concept
   
   ```python
   import re
   import time
   
   # Malicious pattern a user could submit as an "ignore pattern"
   pattern = re.compile(r'^(a+)+$')
   
   # Test with increasing lengths
   for length in [20, 25, 28, 30]:
       test_string = 'a' * length + 'X'
       start = time.time()
       pattern.match(test_string)
       elapsed = time.time() - start
       print(f"Length {length}: {elapsed:.2f} seconds")
   ```
   
   ### Output (approximate)
   
   ```
   Length 20: 0.05 seconds
   Length 25: 1.50 seconds
   Length 28: 12.00 seconds
   Length 30: 48.00 seconds
   ```
   
   Each additional character roughly **doubles** the time. At length 40, you're 
looking at **days**.
   
   ---
   
   ## Other Malicious Patterns
   
   | Pattern | Description |
   |---------|-------------|
   | `(.*a){25}` | Overlapping `.*` with literal |
   | `([a-zA-Z]+)*$` | Nested quantifiers on character class |
   | `(a\|aa)+$` | Overlapping alternatives |
   | `\d+\.\d+\.\d+\.\d+` | When matched against `1.1.1.1.1.1.1.1.1.1.1.1X` |
   
   ---
   
   ## Attack Scenario
   
   1. Attacker submits `^(a+)+$` as an ignore pattern via the web interface
   2. Attacker uploads a file named `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX`
   3. When the system checks if the filename matches the ignore pattern, the 
worker process **hangs indefinitely**
   4. Worker thread/process is consumed, potentially causing denial of service
   
   ---
   
   ## Mitigation
   
   1. **Use RE2** - Google's RE2 library guarantees linear-time matching
   2. **Use fnmatch** - Switch to glob patterns instead of regex
   3. **Pattern validation** - Reject patterns with nested quantifiers
   4. **Length limits** - Cap pattern length at 500 characters
   5. **Timeouts** - Set execution timeout on regex operations
   
   ```python
   # Safe alternative using Google RE2
   import re2
   
   def compile_safe_pattern(pattern: str):
       return re2.compile(pattern)  # Guaranteed O(n) matching
   ```
   
   ## Recommended Fix
   
   ```python
   # atr/storage/readers/checks.py
   import re
   import re2  # Google's RE2 library (pip install google-re2)
   from typing import Optional
   
   MAX_PATTERN_LENGTH: int = 500
   DANGEROUS_PATTERNS = [
       r'(\+\+|\*\*|\?\?)',  # Nested quantifiers
       r'\(\?[^)]*\)',  # Lookahead/lookbehind
   ]
   
   def compile_safe_pattern(pattern: str) -> Optional[re.Pattern]:
       """Compile a user-supplied pattern safely, or return None if invalid."""
       # Length limit
       if len(pattern) > MAX_PATTERN_LENGTH:
           raise ValueError(f"Pattern too long: {len(pattern)} > 
{MAX_PATTERN_LENGTH}")
       
       # Check for dangerous constructs
       for dangerous in DANGEROUS_PATTERNS:
           if re.search(dangerous, pattern):
               raise ValueError(f"Pattern contains potentially dangerous 
construct")
       
       # Use RE2 for guaranteed linear-time matching
       try:
           return re2.compile(pattern)
       except re2.error as e:
           raise ValueError(f"Invalid pattern: {e}")
   
   # Alternative: Use fnmatch for simple glob patterns instead of regex
   import fnmatch
   
   def match_ignore_pattern(pattern: str, filename: str) -> bool:
       """Match using fnmatch (glob-style) instead of regex."""
       return fnmatch.fnmatch(filename, pattern)
   ```
   
   ## Acceptance Criteria
   
   - [ ] Pattern length limit enforced (recommend 500 chars)
   - [ ] Dangerous regex constructs rejected or pattern complexity score 
implemented
   - [ ] Consider switching to `fnmatch` glob patterns instead of regex
   - [ ] Consider using Google RE2 library for guaranteed linear-time matching
   - [ ] Timeout on pattern matching operations as defense-in-depth
   - [ ] Test cases for ReDoS patterns (e.g., `(a+)+$`)
   - [ ] User feedback for rejected patterns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Limit regex complexity in user-supplied ignore patterns (tooling-trusted-releases)

Reply via email to