Hi everyone,
The context for this discussion is a recent PR which applied AIR201-like
replacements to the codebase that had to be reverted [1].
It turns out that using task.output["key"] in place of {{
xcom_pull('task')['key'] }} is only correct when `multiple_outputs` is
True. Which brings me to the matter at hand: the default value of
`multiple_outputs` is not constant, and is inferred based on the presence
of type hints, for example:

@task
def a():              # multiple_outputs = False
    return {"x": 1}

@task
def b() -> dict:      # multiple_outputs = True  ← inferred
    return {"x": 1}

While the effects of multiple_outputs are mentioned in the docs [2], the
effect of the type hint on functionality is, at best, implied [3]. Thus, an
unsuspecting user adding type hints might not expect the change in
functionality they will later observe.

Without getting into whether multiple_outputs inference is good UX, I'd
like to propose a new Ruff rule, which will detect when a task's output is
either a dict or the task callable is annotated with a dict return type,
and tell the user to specify the multiple_outputs value explicitly.
Such a rule will have several benefits: 1) Make the intent of the Dag
author explicit; 2) Robustify the code to future changes modifying the
inference logic; 3) Increase awareness of `multiple_outputs`.

Let me know what you think.

Best,
Dev-iL

[1]: https://github.com/apache/airflow/pull/66712#issuecomment-4424585768
[2]:
https://airflow.apache.org/docs/apache-airflow/stable/tutorial/taskflow.html#step-2-write-your-tasks-with-task
[3]:
https://airflow.apache.org/docs/apache-airflow/stable/howto/create-custom-decorator.html#optional-adding-ide-auto-completion-support

Reply via email to