kaxil commented on code in PR #1486: URL: https://github.com/apache/airflow-site/pull/1486#discussion_r3044900640
########## landing-pages/site/content/en/blog/airflow-3.2.0/index.md: ########## @@ -0,0 +1,246 @@ +--- +title: "Apache Airflow 3.2.0: Data-Aware Workflows at Scale" +linkTitle: "Apache Airflow 3.2.0: Data-Aware Workflows at Scale" +author: "Rahul Vats" +github: "vatsrahul1001" +linkedin: "vats-rahul" +description: "Apache Airflow 3.2.0 introduces Asset partitioning for granular pipeline orchestration, multi-team deployments for enterprise scale, synchronous deadline alert callbacks, and continued progress toward full Task SDK separation." +tags: [Release] +date: "2026-04-07" +images: ["/blog/airflow-3.2.0/images/3.2.0.jpg"] +--- + +We're proud to announce the release of **Apache Airflow 3.2.0**! Airflow 3.1 puts humans at the center of automated workflows. 3.2 brings that same precision to data: Asset partitioning for granular pipeline orchestration, multi-team deployments for enterprise scale, synchronous deadline alert callbacks, and continued progress toward full Task SDK separation. + +**Details**: + +📦 PyPI: https://pypi.org/project/apache-airflow/3.2.0/ \ +📚 Docs: https://airflow.apache.org/docs/apache-airflow/3.2.0/ \ +🛠️ Release Notes: https://airflow.apache.org/docs/apache-airflow/3.2.0/release_notes.html \ +🐳 Docker Image: `docker pull apache/airflow:3.2.0` \ +🚏 Constraints: https://github.com/apache/airflow/tree/constraints-3.2.0 + +# 🗂️ Asset Partitioning (AIP-76): Only the Right Work Gets Triggered + +Asset partitioning has been one of the most requested additions to data-aware scheduling. If you work with date-partitioned S3 paths, Hive table partitions, BigQuery partitions, or really any partitioned data store, you've dealt with this: An upstream task updates one partition, and every downstream Dag fires regardless of which slice actually changed. It's wasteful, and for large deployments it creates real operational noise. + +Asset partitioning in 3.2 makes this granular. Downstream Dags trigger only when the specific partition they care about gets updated. It's the biggest change to data-aware scheduling since Assets were introduced, and it turns partition-driven orchestration into something Airflow handles natively rather than something you work around. + + + +## Key Capabilities + +* **Partition-driven scheduling**: Dags trigger on specific partition updates, not every asset change +* **CronPartitionTimetable**: Schedule Dags against partitions using cron expressions. Also available in the Task SDK +* **Backfill for partitioned Dags**: Backfill historical partitions without re-triggering everything downstream (#61464) +* **Multi-asset partitions**: A single Dag can listen for partitions across multiple assets, which matters when your downstream work depends on several sources aligning (#60577) + +For more advanced use cases, there are temporal and range partition mappers (#61522, #55247) for mapping time ranges and value ranges to partition keys, a partition key field on Dag run references (#61725) so you can inspect exactly which partition triggered a run, and PartitionedAssetTimetable for full control over how partition events from multiple assets get resolved into a unified trigger. + +**Example**: Three upstream ingestion Dags each write to a separate asset on an hourly cadence. The downstream Dag only triggers when all three have updated the same hourly partition. Since the three assets don't share a partition key natively, a mapper resolves them into a common key. + +```py +from __future__ import annotations + +from airflow.sdk import ( + DAG, + Asset, + CronPartitionTimetable, + PartitionedAssetTimetable, + StartOfHourMapper, + asset, + task, +) + +team_a_player_stats = Asset(uri="file://incoming/player-stats/team_a.csv", name="team_a_player_stats") +combined_player_stats = Asset(uri="file://curated/player-stats/combined.csv", name="combined_player_stats") + + +with DAG( + dag_id="ingest_team_a_player_stats", + schedule=CronPartitionTimetable("0 * * * *", timezone="UTC"), + tags=["player-stats", "ingestion"], +): + + @task(outlets=[team_a_player_stats]) + def ingest_team_a_stats(): + """Materialize Team A player statistics for the current hourly partition.""" + pass + + ingest_team_a_stats() + + +@asset(schedule=CronPartitionTimetable("15 * * * *", timezone="UTC")) +def team_b_player_stats(): + pass + + +with DAG( + dag_id="clean_and_combine_player_stats", + schedule=PartitionedAssetTimetable( + assets=team_a_player_stats & team_b_player_stats, + default_partition_mapper=StartOfHourMapper(), + ), + catchup=False, +): + + @task(outlets=[combined_player_stats]) + def combine_player_stats(dag_run=None): + """Merge the aligned hourly partitions into a combined dataset.""" + print(dag_run.partition_key) + + combine_player_stats() +``` + +See `example_asset_partition.py` and the Task SDK API docs for `PartitionedAssetTimetable` and partition mappers. Review Comment: Worth linking it to it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
