EdwardRadical commented on issue #25165:
URL: https://github.com/apache/airflow/issues/25165#issuecomment-1203698383

   No, @uranusjr, there is more to it. This is the complete snippet:
   
   ```
       task_group = task_group or TaskGroupContext.get_current_task_group(dag)
       tg_task_id = task_group.child_id(task_id) if task_group else task_id
   
       if tg_task_id not in dag.task_ids:
           return task_id
   ```
   
   This happens at the time `expand` is called: albeit `tg_task_id` indeed 
contains the correct label, it is ignored because `tg_task_id not in 
dag.task_ids` evaluates to `True` (for obvious reasons), so the naked `task_id` 
is returned by `get_unique_task_id`.
   
   And this was probably a rough two-liners to fix the obvious collision [with 
this other 
snippet](https://github.com/apache/airflow/blob/3138604b264878f27505223bd14c7814eacc1e57/airflow/models/baseoperator.py#L759-L762)
 inside `BaseOperator`'s `__init__` which calls the exact same function 
(`task_group.child_id(task_id)`):
   
   ```
           if task_group:
               self.task_id = task_group.child_id(task_id)
           else:
               self.task_id = task_id
   ```
   
   So while `get_unique_task_id` does **_not_** return an unique task_id in 
case the task belongs to a group, for most use cases the task_id is fixed later 
by `BaseOperator`, but the snippet above is not contained in `MappedOperator`'s 
`__init__`, and `MappedOperator` does not inherit from `BaseOperator` either so 
it's not going to run the `__init__` from it - and as a consequence of these, 
the result will be a duplicate.
   
   The reason why I am explaining this to you, @uranusjr, albeit I know where 
the issue is, I know how it manifests, and I can maybe patch it, is that I 
recognize I am not the one with the most knowledge of the environment here and 
any fix I can create will most likely not be in the spirit of the architecture, 
as a major contributor to AIP-42, _you_ are that guy with expert knowledge: so 
I can continue to convince you for a couple messages, give up, and create a PR 
out of my flawed expert knowledge, or in the spirit of OSS you can hear me out 
and I can find how to fix it best without introducing dangerous assumptions or 
unsound logic. 
   
   Given how the bug seem to happen, this is what it would need to fix the 
issue, to the best of my knowledge:
   
   ```
   ===================================================================
   diff --git a/airflow/decorators/base.py b/airflow/decorators/base.py
   --- a/airflow/decorators/base.py     (revision 
298be502c35006b7c3f011b676dbb4db0633bc74)
   +++ b/airflow/decorators/base.py     (date 1659517989535)
   @@ -359,6 +359,9 @@
            partial_kwargs.update(task_kwargs)
    
            task_id = get_unique_task_id(partial_kwargs.pop("task_id"), dag, 
task_group)
   +        if task_group:
   +            task_id = task_group.child_id(task_id)
   +
            params = partial_kwargs.pop("params", None) or default_params
    
            # Logic here should be kept in sync with BaseOperatorMeta.partial().
   ```
   
   Let me know what you think of it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to