uranusjr edited a comment on issue #19058:
URL: https://github.com/apache/airflow/issues/19058#issuecomment-946580267
OK, I think I know what's going on. The log handler uses a template to
render the log's filename, which (by default) is `{{ ti.dag_id }}/{{ ti.task_id
}}/{{ ts }}/{{ try_number }}.log`. The problem is the `{{ ts }}` part. Prior to
AIP-39, this is simply the ISO-8601 rendition of `execution_date`, but AIP-39
decoupled a DAG run's _identifying timestamp_ (`logical_date`) and _operating
period_ (`data_interval`) and split `execution_date`'s semantic meaning into
two. This means we needed to decide which semantic some of the "derived"
variables, such as `ts` and `ts_nodash`, so pick to use. We chose
`data_interval.start` in 2.2.0 because we guessed that's what most people would
want.
Another change we made when we decoupled `logical_date` to `data_interval`
is to "fix" manual DAG runs' data interval. Prior to AIP-39, since a DAG run's
operating period is inferred from `execution_date`, a manual DAG run's data
interval is nonsensical since `execution_date` is set to when the run is
triggered doesn't have a logical end time at all. So 2.2.0 introduced new logic
to "align" a manual run's data interval to match the _most recent completed
schedule_, but keep its `logical_date` to indicate the same value as
`execution_date` previously. But this introduces a problem for log file
identification with `ts`, as shown here.
So the easiest way out here is to change the default log filename template
to not use `{{ ts }}` but `{{ logical_date|ts }}`. But this would also mean
that any user-specified custom `log_filename_template` configuration would
still be broken and need to be migrated, which does not sound viable (and
compatibility-breaking). Therefore, I think the only viable fix available is to
roll back the semantic change we made to `ts`, `ts_nodash` etc. so they again
indicate `execution_date` i.e. `logical_date`. This is quite unfortunate since
it'd make migration from pre-AIP-39 implicit data interval to modern data
interval-based semantic more difficult, but probably the only reasonable
approach.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]