EdwardRadical commented on issue #25165:
URL: https://github.com/apache/airflow/issues/25165#issuecomment-1203698383
No, @uranusjr, there is more to it. This is the complete snippet:
```
task_group = task_group or TaskGroupContext.get_current_task_group(dag)
tg_task_id = task_group.child_id(task_id) if task_group else task_id
if tg_task_id not in dag.task_ids:
return task_id
```
This happens at the time `expand` is called: albeit `tg_task_id` indeed
contains the correct label, it is ignored because `tg_task_id not in
dag.task_ids` evaluates to `True` (for obvious reasons), so the naked `task_id`
is returned by `get_unique_task_id`.
And this was probably a rough two-liners to fix the obvious collision [with
this other
snippet](https://github.com/apache/airflow/blob/3138604b264878f27505223bd14c7814eacc1e57/airflow/models/baseoperator.py#L759-L762)
inside `BaseOperator`'s `__init__` which calls the exact same function
(`task_group.child_id(task_id)`):
```
if task_group:
self.task_id = task_group.child_id(task_id)
else:
self.task_id = task_id
```
So while `get_unique_task_id` does **_not_** return an unique task_id in
case the task belongs to a group, for most use cases the task_id is fixed later
by `BaseOperator`, but the snippet above is not contained in `MappedOperator`'s
`__init__`, and `MappedOperator` does not inherit from `BaseOperator` either so
it's not going to run the `__init__` from it - and as a consequence of these,
the result will be a duplicate.
The reason why I am explaining this to you, @uranusjr, albeit I know where
the issue is, I know how it manifests, and I can maybe patch it, is that I
recognize I am not the one with the most knowledge of the environment here and
any fix I can create will most likely not be in the spirit of the architecture,
as a major contributor to AIP-42, _you_ are that guy with expert knowledge: so
I can continue to convince you for a couple messages, give up, and create a PR
out of my flawed expert knowledge, or in the spirit of OSS you can hear me out
and I can find how to fix it best without introducing dangerous assumptions or
unsound logic.
Given how the bug seem to happen, this is what it would need to fix the
issue, to the best of my knowledge:
```
===================================================================
diff --git a/airflow/decorators/base.py b/airflow/decorators/base.py
--- a/airflow/decorators/base.py (revision
298be502c35006b7c3f011b676dbb4db0633bc74)
+++ b/airflow/decorators/base.py (date 1659517989535)
@@ -359,6 +359,9 @@
partial_kwargs.update(task_kwargs)
task_id = get_unique_task_id(partial_kwargs.pop("task_id"), dag,
task_group)
+ if task_group:
+ task_id = task_group.child_id(task_id)
+
params = partial_kwargs.pop("params", None) or default_params
# Logic here should be kept in sync with BaseOperatorMeta.partial().
```
Let me know what you think of it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]