dave2wave commented on issue #388:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/388#issuecomment-3702883085

   Some version matching code from ChatGPT that should work for all the 
observed paths I've seen. Will need extension for cases with both a Scala 
version and release version.
   
   See https://chatgpt.com/s/t_69558cae4ee48191888f1ed42cc8ada8
   
   Below is a **small, clean extension** of the existing API that returns the 
**unmatched prefix and suffix** around the detected token, while preserving all 
prior guarantees:
   
   * ✅ calendar-accurate dates
   * ✅ date beats version
   * ✅ no overlaps
   * ✅ named capture groups
   * ✅ consistent separators
   
   ---
   
   ## Returned structure
   
   ```python
   {
       "kind": "date" | "version",
       "value": "matched token",
       "prefix": "text before match",
       "suffix": "text after match",
       "match": re.Match,
   }
   ```
   
   ---
   
   ## Updated implementation
   
   ```python
   import re
   from typing import Optional, Match, Literal, TypedDict
   from datetime import date
   
   
   class VersionDateMatch(TypedDict):
       kind: Literal["date", "version"]
       value: str
       prefix: str
       suffix: str
       match: Match[str]
   
   
   # ---------------------------
   # DATE REGEX (structure only)
   # ---------------------------
   DATE_PATTERN = re.compile(
       r"""
       (?<!\d)
       (?P<date>
           (?P<year>[12]\d{3})
           (?:
               (?P<sep>[._-])
               (?P<month>0[1-9]|1[0-2])
               (?:
                   (?P=sep)
                   (?P<day>0[1-9]|[12]\d|3[01])
               )?
           )?
       )
       (?!\d)
       """,
       re.VERBOSE,
   )
   
   
   # ---------------------------
   # VERSION REGEX
   # ---------------------------
   VERSION_PATTERN = re.compile(
       r"""
       (?<!\d)
       (?P<version>
           (?P<major>0|[1-9]\d*)
           (?:
               (?P<sep>[._-])
               (?P<minor>0|[1-9]\d*)
               (?:
                   (?P=sep)
                   (?P<patch>0|[1-9]\d*)
                   (?:
                       (?P=sep)
                       (?P<build>0|[1-9]\d*)
                   )?
               )?
           )?
       )
       (?!\d)
       """,
       re.VERBOSE,
   )
   ```
   
   ---
   
   ## Calendar validation helper (unchanged)
   
   ```python
   def _is_valid_calendar_date(
       year: int,
       month: Optional[int],
       day: Optional[int],
   ) -> bool:
       try:
           if month is None:
               return True
           if day is None:
               date(year, month, 1)
               return True
           date(year, month, day)
           return True
       except ValueError:
           return False
   ```
   
   ---
   
   ## Public API (with prefix/suffix)
   
   ```python
   def find_version_or_date(text: str) -> Optional[VersionDateMatch]:
       """
       Find the first calendar-valid date or version token and return
       the unmatched prefix and suffix text.
   
       Returns
       -------
       VersionDateMatch | None
       """
   
       # 1️⃣ Dates first (non-overlapping)
       for m in DATE_PATTERN.finditer(text):
           year = int(m.group("year"))
           month = int(m.group("month")) if m.group("month") else None
           day = int(m.group("day")) if m.group("day") else None
   
           if _is_valid_calendar_date(year, month, day):
               return {
                   "kind": "date",
                   "value": m.group("date"),
                   "prefix": text[: m.start()],
                   "suffix": text[m.end() :],
                   "match": m,
               }
   
       # 2️⃣ Versions fallback
       m = VERSION_PATTERN.search(text)
       if m:
           return {
               "kind": "version",
               "value": m.group("version"),
               "prefix": text[: m.start()],
               "suffix": text[m.end() :],
               "match": m,
           }
   
       return None
   ```
   
   ---
   
   ## Examples / Tests (prefix & suffix)
   
   ```python
   tests = [
       "release-2024-02-29-final",
       "v1.2.3",
       "prefix_2025_12_suffix",
       "build.0.9.1-rc1",
       "2025.12.1",            # invalid date → version
   ]
   
   for t in tests:
       r = find_version_or_date(t)
       if r:
           print(
               f"{t!r}\n"
               f"  kind   : {r['kind']}\n"
               f"  value  : {r['value']}\n"
               f"  prefix : {r['prefix']!r}\n"
               f"  suffix : {r['suffix']!r}\n"
           )
       else:
           print(f"{t!r} -> None\n")
   ```
   
   ### Expected output (conceptual)
   
   ```
   'release-2024-02-29-final'
     kind   : date
     value  : 2024-02-29
     prefix : 'release-'
     suffix : '-final'
   
   'v1.2.3'
     kind   : version
     value  : 1.2.3
     prefix : 'v'
     suffix : ''
   
   'prefix_2025_12_suffix'
     kind   : date
     value  : 2025_12
     prefix : 'prefix_'
     suffix : '_suffix'
   
   'build.0.9.1-rc1'
     kind   : version
     value  : 0.9.1
     prefix : 'build.'
     suffix : '-rc1'
   
   '2025.12.1'
     kind   : version
     value  : 2025.12.1
     prefix : ''
     suffix : ''
   ```
   
   ---
   
   ## Why this is useful
   
   * Enables **rewriting**, **normalization**, or **templating**
   * Ideal for build systems, release tooling, and APIs
   * Prefix/suffix logic avoids fragile string slicing elsewhere
   
   If you want:
   
   * multiple matches with spans
   * replace/normalize APIs
   * strict ISO formatting output
   * performance tuning for large inputs
   
   I can extend this cleanly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to