Hey everyone, I updated our meeting notes document in the Airflow wiki to capture the notes from our dev call on Thursday, the 5th of December. The link for those notes is here <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-Summary.14>
Loved the progress on the FAB compatibility project, DAG Bundles and Versioning, Data Assets, and the discussion around Data completeness. Great work team! To everyone who attended the meeting, please check the summary and add anything that I may have missed. For those who could not join, please let us know if you disagree with anything discussed and agreed upon in the meeting. Also, please do ask questions if something is unclear. There's already an initial agenda for our next dev call, which is scheduled for 19th Dec. If you would like something to be added to the proposed agenda for that meeting, please add it here <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-(Proposed)Agenda.4> or let me know. Best regards and talk to you all soon, Vikram -- Below is the summary from the call on Thursday: -- - Follow-up on action items from the last call: - Update on the FAB provider for backwards compatibility project (Jed Cunningham and Vincent Beck): - Jed and Vincent shared the progress to date including a PR <https://github.com/apache/airflow/pull/44464> that already implements plug-in backwards compatibility. - Vikram Koka expressed appreciation for the progress and asked about the expected timing of the remainder of the items to be done and their response was outside of the New UI completeness blocker, the other items could be done by mid-Jan. - Jens Scheffler suggested that the PR to validate dependencies without the new UI be created as a draft and validated with the existing functionality of the new UI rather than waiting for the new UI to be completed. - Update on Performance benchmark scenarios (Michal Modras): - Augusto shared the thinking around performance benchmark scenarios and metrics <https://docs.google.com/document/d/1kyKXkILkHSrkXYCnje-Lev4I983szjfI1tFKhqimj_8/>, with a focus on DAG performance and resource consumption. - Augusto shared that this was a follow up on the work already done on AIP-59 and would be based on the existing performance framework. - There was a significant discussion around the task timings and if those were relevant for realistic performance benchmarks. - Jens asked if this would cover different executors and Augusto responded that this would be Celery first and possibly Kubernetes executor later. - Jens and Vikram brought up comparing the performance of Airflow 2.10 vs. Airflow 3 to identify performance differences. Augusto confirmed that all the tests would be run on both Airflow 2 and 3 to confirm performance changes. - Development updates and presentations: - Update on AIP-75 New Asset-Centric Syntax <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-75+New+Asset-Centric+Syntax> (TP Chung): - TP shared a recording of the new syntax for asset creation. - TP also showed the demo of a new Airflow CLI command to list all the Data Assets and to show the details of a specified Data Asset. - Finally, TP also introduced the "materialized" command for a data asset which ensures that the asset is created by running the DAG which outputs that asset. - Update on AIP-66: DAG Bundles & Parsing <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816356> (Jed Cunningham): - Jed demonstrated the process of defining DAG bundles and how DAG bundles would be parsed by the DAG processor - He mentioned how some of the changes are happening in conjunction with the changes being done in AIP-72. - He also showed bundleIDs and bundle Versions. He then showed how a new version is parsed and reprocessed. - He mentioned that there is much more work to be done, but the core of bundle definitions and DAGs being processed from those bundles is now in place. - In response to questions, he clarified how DAG Bundles currently pull down the entire Git clone into a temporary folder, so that all DAGs and their friends/dependencies could be processed. And that, more optimization is very feasible. - Update on AIP-78 Scheduler-managed backfill <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-78+Scheduler-managed+backfill> (Daniel Standish): - Daniel said that all the back-end server work for this AIP as scoped has been completed for a bit. He added that the front-end UI work will be done as part of AIP-38. - He however added that there is a Data completeness conversation to be had as a result, which led to the discussion below. - Discussion topics: - Data completeness discussion (Daniel Standish): - Daniel brought up the concept of implicit data partitioning already in Airflow with the concept of execution date, when catchup is defined to be True. - Daniel advocated making this implicit data partitioning an explicit concept in Airflow, arguing that the existing grid view is already an incarnation of the same. - At a high level, users could declare that a DAG is partition-driven, based on the timetable. Going forward, Backfills or catchup would only be supported for partition-driven DAGs. - For backwards compatibility, old DAGs would be assumed to be partition driven. - The immediate reaction from the team is that this is a big change and there was significant discussion if this is absolutely required. - Daniel said that the trigger for this was AIP-78 Scheduler-managed backfill <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-78+Scheduler-managed+backfill> and AIP-83 Remove Execution Date Unique Constraint from DAG run <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-83+Rename+execution_date+-%3E+logical_date+and+remove+unique+constraint>, which left a bit of a vacuum between them. - The follow-up action item after the discussion was for Daniel to share thoughts async and everyone to think about the need for this. - Milestone and scope update (Vikram Koka) - Vikram shared that at a high level development was on track towards the plan shared earlier. - However, there would be one scope change with AIP-80 Explicit Template Fields in Operator Arguments <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-80+Explicit+Template+Fields+in+Operator+Arguments> being deferred from 3.0 to a future 3.x release. - Action items on/before next dev call: - Daniel Standish to post a document regarding explicit vs. implicit partitioning and its need as a result of the removal of execution date, especially with an eye towards backwards compatibility. Team to consider the introduction of a partition concept in Airflow. <https://www.astronomer.io/>