dstandish commented on PR #64571:
URL: https://github.com/apache/airflow/pull/64571#issuecomment-4577677517

   Hey @Lee-W 
   
   I was a little surprised to see so many lines for this change so I asked 
Claude to help me review whether there's any unnecessary complexity.  Below are 
its findings. They seem plausible.  What do you think?
   
   ---
   
     The core design is clean (the Window types are tiny generators,
     RollupMapper.to_upstream is a clear decode→expand→encode, and the 
scheduler gate comes down to one
     expected.issubset(actual) check), and the test coverage on the scheduler 
paths is excellent. The
     HA hardening in _create_dagruns_for_partitioned_asset_dags (bulk-fetch 
over the old N+1,
     with_row_locks(skip_locked=True), the per-tick cap and deterministic 
order_by) all looks right.
   
     One theme I'd like to resolve before merge: a meaningful chunk of 
complexity is self-induced,
     concentrated in partition_mappers/temporal.py.
   
     _compile_output_format_regex looks like more generality than the feature 
needs. Only two mappers
     can't round-trip through strptime: StartOfWeekMapper (%V isn't 
strptime-parseable) and
     StartOfQuarterMapper ({quarter} isn't a directive). The actual requirement 
is "recover Y/m/d" and
     "recover Y/quarter". But the decode that motivated the whole compiler ends 
up discarding the token
     it was built for:
   
     # StartOfWeekMapper.decode_downstream — %V is captured but never read
     return datetime(int(match["Y"]), int(match["m"]), int(match["d"]))
   
     Both decodes collapse to a dedicated per-mapper regex:
   
     _WEEK_RE    = re.compile(r"(?P<Y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})")
     _QUARTER_RE = re.compile(r"(?P<Y>\d{4})-Q(?P<quarter>[1-4])")
   
     That drops the directive table, the {name} placeholder machinery, the 
placeholder_patterns arg,
     and the five compile-time ValueError branches — plus the test class that 
only exists to exercise
     that invented surface (test_rejects_adjacent_default_pattern_placeholders,
     test_adjacent_placeholders_allowed_when_one_is_narrowed,
     test_separator_between_default_placeholders_is_allowed). Roughly ~120 
lines + tests, with
     identical behavior for every shipped mapper. Could we go with the 
dedicated regexes for now and
     add generality if/when a custom mapper actually needs it?
   
     Follow-on: if the temporal decode simplifies, the base-class guard 
scaffolding in
     partition_mappers/base.py — the __init_subclass__ XOR check, 
expected_decoded_type, and the
     runtime pairing check in RollupMapper.__init__ — largely loses its purpose 
for the shipped
     mappers, since it mostly exists to make the general decode/encode pair 
safe. Worth reassessing
     whether it belongs now or arrives with the first real custom mapper.
   
     Question (non-blocking): the audit-log path (_record_partition_audit_log /
     _record_stale_apdr_audit_log) opens independent scoped=False sessions to 
survive an outer
     rollback, plus a process-lifetime dedup set. For advisory Log rows where 
self.log.exception(...)
     already records the same thing, is the rollback-survival guarantee worth 
the machinery for v1, or
     would a plain log line do?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to