dave2wave commented on issue #388: URL: https://github.com/apache/tooling-trusted-releases/issues/388#issuecomment-3702883085
Some version matching code from ChatGPT that should work for all the observed paths I've seen. Will need extension for cases with both a Scala version and release version. See https://chatgpt.com/s/t_69558cae4ee48191888f1ed42cc8ada8 Below is a **small, clean extension** of the existing API that returns the **unmatched prefix and suffix** around the detected token, while preserving all prior guarantees: * ✅ calendar-accurate dates * ✅ date beats version * ✅ no overlaps * ✅ named capture groups * ✅ consistent separators --- ## Returned structure ```python { "kind": "date" | "version", "value": "matched token", "prefix": "text before match", "suffix": "text after match", "match": re.Match, } ``` --- ## Updated implementation ```python import re from typing import Optional, Match, Literal, TypedDict from datetime import date class VersionDateMatch(TypedDict): kind: Literal["date", "version"] value: str prefix: str suffix: str match: Match[str] # --------------------------- # DATE REGEX (structure only) # --------------------------- DATE_PATTERN = re.compile( r""" (?<!\d) (?P<date> (?P<year>[12]\d{3}) (?: (?P<sep>[._-]) (?P<month>0[1-9]|1[0-2]) (?: (?P=sep) (?P<day>0[1-9]|[12]\d|3[01]) )? )? ) (?!\d) """, re.VERBOSE, ) # --------------------------- # VERSION REGEX # --------------------------- VERSION_PATTERN = re.compile( r""" (?<!\d) (?P<version> (?P<major>0|[1-9]\d*) (?: (?P<sep>[._-]) (?P<minor>0|[1-9]\d*) (?: (?P=sep) (?P<patch>0|[1-9]\d*) (?: (?P=sep) (?P<build>0|[1-9]\d*) )? )? )? ) (?!\d) """, re.VERBOSE, ) ``` --- ## Calendar validation helper (unchanged) ```python def _is_valid_calendar_date( year: int, month: Optional[int], day: Optional[int], ) -> bool: try: if month is None: return True if day is None: date(year, month, 1) return True date(year, month, day) return True except ValueError: return False ``` --- ## Public API (with prefix/suffix) ```python def find_version_or_date(text: str) -> Optional[VersionDateMatch]: """ Find the first calendar-valid date or version token and return the unmatched prefix and suffix text. Returns ------- VersionDateMatch | None """ # 1️⃣ Dates first (non-overlapping) for m in DATE_PATTERN.finditer(text): year = int(m.group("year")) month = int(m.group("month")) if m.group("month") else None day = int(m.group("day")) if m.group("day") else None if _is_valid_calendar_date(year, month, day): return { "kind": "date", "value": m.group("date"), "prefix": text[: m.start()], "suffix": text[m.end() :], "match": m, } # 2️⃣ Versions fallback m = VERSION_PATTERN.search(text) if m: return { "kind": "version", "value": m.group("version"), "prefix": text[: m.start()], "suffix": text[m.end() :], "match": m, } return None ``` --- ## Examples / Tests (prefix & suffix) ```python tests = [ "release-2024-02-29-final", "v1.2.3", "prefix_2025_12_suffix", "build.0.9.1-rc1", "2025.12.1", # invalid date → version ] for t in tests: r = find_version_or_date(t) if r: print( f"{t!r}\n" f" kind : {r['kind']}\n" f" value : {r['value']}\n" f" prefix : {r['prefix']!r}\n" f" suffix : {r['suffix']!r}\n" ) else: print(f"{t!r} -> None\n") ``` ### Expected output (conceptual) ``` 'release-2024-02-29-final' kind : date value : 2024-02-29 prefix : 'release-' suffix : '-final' 'v1.2.3' kind : version value : 1.2.3 prefix : 'v' suffix : '' 'prefix_2025_12_suffix' kind : date value : 2025_12 prefix : 'prefix_' suffix : '_suffix' 'build.0.9.1-rc1' kind : version value : 0.9.1 prefix : 'build.' suffix : '-rc1' '2025.12.1' kind : version value : 2025.12.1 prefix : '' suffix : '' ``` --- ## Why this is useful * Enables **rewriting**, **normalization**, or **templating** * Ideal for build systems, release tooling, and APIs * Prefix/suffix logic avoids fragile string slicing elsewhere If you want: * multiple matches with spans * replace/normalize APIs * strict ISO formatting output * performance tuning for large inputs I can extend this cleanly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
