hussein-awala commented on PR #64610:
URL: https://github.com/apache/airflow/pull/64610#issuecomment-4657453435

   Sorry for the late reply @pierrejeambrun, and thanks for the review.
   
   You raise a fair point about consistency, and I agree the regex introduction 
deserves a dedicated discussion. Let me explain the concrete gap so we can 
decide together.
   
   The existing `pattern` (substring `ILIKE '%term%'`) and `prefix_pattern` 
(prefix range scan) filters are great for free-text columns like `dag_id`, but 
they fall short for **partition keys**, which are typically *structured 
composite strings* (e.g. `us|2024-01-15`, `region=us/date=2024-01-15`). A few 
queries that are common for partition keys and that `pattern`/`prefix_pattern` 
cannot express:
   
   **1. The `|` delimiter collides with the OR operator**
   
   Both filters treat `|` as logical OR (`val_str.split("|")`). But partition 
keys very frequently use `|` as a field separator (see this PR's own examples 
like `us|2026-03-10`). So:
   - `prefix_pattern=us|2024-01-15` is parsed as `OR(prefix "us", prefix 
"2024-01-15")` — not a match on the literal key.
   - With regex you escape it: `partition_key_pattern=^us\|2024-01-15$`.
   
   **2. Suffix / "ends with" matching**
   
   Find all partitions for a given date regardless of the region prefix:
   - regex: `partition_key_pattern=\|2024-01-15$`
   - `pattern` can only do unanchored substring (`%2024-01-15%`), which also 
matches `2024-01-15T09:00`; `prefix_pattern` only matches from the start. 
Neither can anchor to the end.
   
   **3. Character-class / format constraints**
   
   Match only well-formed date partitions for a region:
   - regex: `partition_key_pattern=^us\|\d{4}-\d{2}-\d{2}$`
   - LIKE's `_` matches *any* character, so `us|2024-__-__` would also match 
`us|2024-ab-cd`. There's no way to require digits.
   
   **4. Structured mid-string alternation**
   
   Match either region, but only for a specific month:
   - regex: `partition_key_pattern=^(us|eu)\|2024-03`
   - The `|` OR in `pattern`/`prefix_pattern` splits the whole term, so it 
can't express "us OR eu, each followed by `|2024-03`".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to