foldvari commented on issue #51821: URL: https://github.com/apache/airflow/issues/51821#issuecomment-4170954893
@potiuk I think we might be dismissing the previous behavior too quickly as “accidental” or “undefined,” and I’d like to challenge that a bit. First, the Airflow 2.x documentation explicitly states that `task_ids=None` removes the filter. That strongly suggests the behavior was intentional (or at least relied upon as defined behavior), not incidental. Users reasonably built logic on top of that contract. Second, the new default in Airflow 3—falling back to the current task’s `task_id`—doesn’t seem to provide meaningful value. In practice, it won’t return anything useful in most real scenarios, since tasks typically don’t consume their own XCom outputs within the same run. So we effectively replaced a broadly useful default with one that is almost always a no-op. More importantly, the change introduces real usability regressions: * It forces explicit coupling to upstream `task_id`s, which makes DAGs more fragile under refactoring. * It makes reusable/modular DAG components harder to design, since consumers now need knowledge of upstream internals. * It removes a simple and practical pattern: “give me the latest value for this key regardless of producer.” Regarding the “ambiguity” concern: I don’t think the prior behavior is inherently undefined. A reasonable and consistent interpretation is: > return the most recent XCom for the given key within the DAG scope. Yes, edge cases exist (multiple producers, race conditions), but those are already part of DAG design responsibility. Many Airflow features assume users understand execution semantics; this doesn’t seem fundamentally different. Also, while it’s technically possible to replicate the old behavior via the public API, the workaround is not equivalent: * It requires an additional REST call (performance + complexity cost) * It doesn’t integrate with Jinja templating, so it breaks common operator patterns Given all this, would it make sense to reconsider either: * restoring the previous default behavior, or * introducing an explicit, supported mode (e.g. `task_ids="*"` or similar) that preserves the “latest value across tasks” semantics? This would keep the stricter behavior available while not breaking a pattern that is both documented and widely useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
