Gollum999 commented on issue #23020:
URL: https://github.com/apache/airflow/issues/23020#issuecomment-1204302199

   For the sake of discussion, how do people feel about letting these names 
entirely *replace* `map_index`?
   
   My primary motivation is that an integer-based `map_index` can get weird if 
the list of tasks changes when re-running a DAG.  In my experience, you can end 
up with missing and/or duplicate tasks due to the mismatched indices.  As a 
result you are generally forced to re-run all of the mapped task instances (if 
not the entire DAG Run), even if 99% of the tasks are unchanged.  I can give 
more concrete examples if anyone is interested.
   
   I'd propose letting the mapped args become (part of?) the primary key for 
the mapped task.  This would solve OP's goal of improving the UX of mapped 
tasks (no more abstract indices), but would also allow for some amount of 
consistency between runs (not just for re-runs, but potentially between 
separate DAG Runs as well).
   
   Conceptually I think this is doable since the args already must be JSON 
serializable, but I'm sure the implementation would be more complex than I am 
imagining.
   
   A couple considerations that come to mind:
   1. These IDs ultimately need to be unique.  So something like `map_index` 
might still be needed to augment the key and resolve duplicates.  I imagine 
this could be similar to how duplicate task_ids are resolved (`task`, 
`task__1`, etc.).
   2. The values used to generate the list of dynamic tasks could be very large 
and complex.  So it seems like you would need the ability to override the 
generated name with a "short name" when args are too complex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to