Sorry for the late comment - I just got to that one. I am ok with the change, I might have one small comment that is worth considering (but not a blocker). The "=" sign has special meaning in a number of shells and requires quoting and when referred to via URL it requires %-encoding - it's one of the RFC 3986 section 2.2 Reserved Characters
This is not a "huge" issue but it makes it a bit easier for any kind of manual, debugging, listing etc to use characters that do not require either. I looked at allowed task_id and dag_ids: KEY_REGEX = re.compile(r'^[\w.-]+$') GROUP_KEY_REGEX = re.compile(r'^[\w-]+$') and list of unreserved RFC3986 characters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~ And it seems to be that either _ or ~ would be a better choice for separator (especially the ~ I think makes very close resemblance to =) drwxrwxr-x 3 jarek jarek 4096 Feb 15 18:24 . drwxr-xr-x 77 jarek jarek 16384 Feb 13 17:29 .. -rw-rw-r-- 1 jarek jarek 0 Feb 15 18:16 'a=b' -rw-rw-r-- 1 jarek jarek 0 Feb 15 18:24 a_b -rw-rw-r-- 1 jarek jarek 0 Feb 15 18:21 a~b But this is just a suggestion - I am also ok with '=' but we have to realise it makes it slightly more annoying for the users. J. On Thu, Feb 10, 2022 at 5:54 PM Ash Berlin-Taylor <[email protected]> wrote: > Hi all, > > As part of AIP-42 we need to split out the task instance log files to > include the "map index". The simple way of doing this would be just to add > another sub-folder, so for instance, we'd have > > {{ dag_id }}/{{ task_id }}/{{ logical_date }}/{{ map_index }}/{{ > try_number }}.log > maptest/consumer/2022-02-09T00:00:00+00:00/0/1.log > > Which is getting "deep" and hard to know what each component is. So Daniel > Standish suggested that maybe we could use the "hive partition style", so > like this: > > dag={{ dag_id }}/task_id={{ task_id }}/logical_date={{ logical_date > }}/map_index={{ map_index }}/attempt={{ try_number }}.log > > dag=maptest/task_id=consumer/logical_date=backfill__2022-02-09T00:00:00+00:00/map_index=0/attempt=1.log > > That also made me realise that the "order" of the components is mixed (as > task_id is "smaller" than date") so think a better order would be > > dag={{ dag_id }}/logical_date={{ logical_date }}/task_id={{ task_id > }}/map_index={{ map_index }}/attempt={{ try_number }}.log > > (I haven't shown it here but the actual template only includes map_index > for mapped tasks.) > > For example the logs folder would have these sort of files in them > (showing a dag with both a mapped and an normal task) > > $ tree ~/airflow/logs /home/ash/airflow/logs ├── dag=maptest │ ├── > run_id=backfill__2022-02-10T00:00:00+00:00 │ │ ├── task_id=consumer │ > │ │ ├── map_index=0 │ │ │ │ └── attempt=1.log │ │ │ ├── > map_index=1 │ │ │ │ └── attempt=1.log │ │ │ └── map_index=2 > │ │ │ └── attempt=1.log │ │ └── task_id=make_list │ │ └── > attempt=1.log └── scheduler ├── 2022-02-10 └── latest -> > /home/ash/airflow/logs/scheduler/2022-02-10 > > We've (well TP) has already handled the historic problem where changing > the log template made old logs unviewable in the UI so that isn't an issue > anymore, so we can change the template as we like. > > I'm asking for lazy consensus to change the log_filename_template to: > - use run_id instead of logical_date > - use key=value hive partition style. > - move run_id (in place of date) to before task_id > > I'm hoping to merge this PR soon (this week) but if we don't get consensus > on the above by Tuesday morning European time we can change the template > back. > > I have this done in a draft PR here > https://github.com/apache/airflow/pull/21495/files > > Thanks, > Ash >
