Hello! Thanks a lot for the positive feedback and the numerous suggestions!
To give some additional context on why it proves such a useful feature, it is often the case that past data is too valuable to be discarded and must be fully processed, including when backfilling. When starting a backfill spanning multiple months, it can happen that a few task instances fail. Finding the execution dates of these runs to display them in the Graph View or the Tree View is quite tricky. Currently, users have to rely on "Browse > DAG Runs" and filter by id & state, which is tedious. The idea of the calendar view is to expose the full state of the DAG, and to highlight dates for which there are failed runs. It then makes it easy to access the Tree View for these dates by clicking these days. Below are some answers to the points raised above: A 3yr history is a lot, and most probably everyone out there cleanup data > older than 3-6 months. In the current proposal, only the years for which there are DAG runs are displayed. In my 2 screenshots, there were runs for 2019, 2020 and 2021, so these 3 years were displayed. However, if there were only runs for the past 3 months, only 2021 would be shown. Also, it might involve a heavy query for the datastore to handle. The advantage of this view, besides conciseness, is that only the "dag_run " table is queried. The query from the datastore will return at most "3 * num days of dag runs" records (as there are 3 possible dag states). I would prefer a week or month view like we have in the Google calendar and > an option to switch between them and also move back and forward. I tend to think that this might somehow defeat the purpose of this view. Displaying a week or a month on DAG runs at once can be achieved with the Tree view. However, that view is very "taxing" on the datastore and quickly becomes difficult to read as the number of displayed runs grows. I wonder if we could provide more context at a glance than just green/red. > Possibly a gradient of percentage success/failed per day? That would be possible, even if maybe not immediately obvious to the users. I would however avoid using yellow or orange as these colors are already used for different task states (up_for_retry and upstream_failed), which might be confusing for users. In the proposal, you mentioned both scheduled and manual triggered DAGs. > Are they different "view options" for the calendar view that you can switch > between It's exactly the same view and the same logic, both in the backend and the frontend. The second screenshot was to highlight how this view makes it easier to visualize the execution dates of a manually triggered DAG compared to the Tree View. Would be nice to have an option for the user to choose a "start_date" and > "end_date" for the calendar view? But I am not sure about this, because it > seems overlay with the tree view I agree that it would overlay with the Tree view. The view is compact enough, and the query to the DB & backend "cheap" enough, that adding date pickers may not bring many performance benefits. In addition, the main "selling point" of this view, to me, is to display an overview of the full DAG state in one go, which the tree view cannot provide when the number of runs is even moderately large. I'm not completely opposed to the idea if there was a compelling use-case for it though. I’m wondering if we could modify the presentation to remove the gaps > between months and instead outline months (similar to the following > screenshot)? At first glance, they could be misconstrued as gaps in runs. I believe it is possible without too much added complexity. I will have a go at it later this week. If everybody is happy with this proposal moving forward, what would be the next step? Should I create an AIP on confluence (I do not have the permission to do so, my confluence login is bhanotte), or would you prefer I tidy up my implementation, address the comments above and open a PR with it? Thank you! Benoit On Tue, Apr 13, 2021 at 6:11 PM Ryan Hamilton <[email protected]> wrote: > Here is the first screenshot if the image didn’t come through: > https://r.hmlt.in/8Lubnr2e > > The second one is moot since I saw that Benoit already addressed my > suggestion in the provided code. > > > -rh > > On Apr 13, 2021, at 1:04 PM, Xinbin Huang <[email protected]> wrote: > > > Hi Ryan > > Thank you for correcting me. I was thinking of daily DAGs only when I > suggested graph view, and tree view definitely makes more sense for > multiple DAG runs per day. > > Your images seem to have some problems showing up on my side, not sure if > other people can see them. > > Best > Bin > > On Tue, Apr 13, 2021 at 9:49 AM Ryan Hamilton > <[email protected]> wrote: > >> In general, I really like this idea—it should be a useful visualization. >> >> >> For the click destination, I think the Tree view does make more sense >> given multiple runs can occur per day. The Graph view is limited to a >> single run (which might not be the problematic one that instigated the >> click). >> >> >> I agree w/ Xinbin, it should probably have a base date/range selection. >> Displaying “all time” history is a bit inconsistent with all of the other >> views. >> >> >> I like Sumit’s suggestion of having month and week views as well. >> Certainly something this could evolve to add in the future. >> >> >> I’m wondering if we could modify the presentation to remove the gaps >> between months and instead outline months (similar to the following >> screenshot)? At first glance, they could be misconstrued as gaps in runs. >> >> >> [image: image.png] >> >> >> We should also add a link to the shortcuts to keep the navigation >> consistent: >> >> >> [image: image.png] >> >> >> >> On Tue, Apr 13, 2021 at 12:05 PM Xinbin Huang <[email protected]> >> wrote: >> >>> Really like it! >>> >>> Some quick thoughts: >>> - I think it will be better to have clicking redirect you to the graph >>> view instead of the tree view >>> - In the proposal, you mentioned both scheduled and manual triggered >>> DAGs. Are they different "view options" for the calendar view that you can >>> switch between? Or they are shown together probably with some visual >>> differences? >>> - Would be nice to have an option for the user to choose a "start_date" >>> and "end_date" for the calendar view? But I am not sure about this, because >>> it seems overlay with the tree view >>> >>> Cheers >>> Bin >>> >>> On Mon, Apr 12, 2021 at 9:12 PM Sumit Maheshwari <[email protected]> >>> wrote: >>> >>>> Nice thoughts, it would be a good addition to Airflow. >>>> >>>> A couple of suggestions: >>>> >>>> - A 3yr history is a lot, and most probably everyone out there >>>> cleanup data older than 3-6 months. Also, it might involve a heavy query >>>> for the datastore to handle. I would prefer a week or month view like we >>>> have in the Google calendar and an option to switch between them and >>>> also >>>> move back and forward. >>>> - Maybe use yellow or orange color to denote days where some >>>> failures and some successes happened. >>>> - The color codes used to represent task states need to be removed >>>> from the Calendar view and maybe introduce similar color codes to >>>> represent >>>> DAG states. >>>> >>>> >>>> On Tue, Apr 13, 2021 at 5:52 AM Kaxil Naik <[email protected]> wrote: >>>> >>>>> Nice, I like it too, only minor suggestion is that it should be after >>>>> Tree View and Graph View in the tab above. >>>>> >>>>> Regards, >>>>> Kaxil >>>>> >>>>> On Mon, Apr 12, 2021 at 11:22 PM Brent Bovenzi >>>>> <[email protected]> wrote: >>>>> >>>>>> Ryan Hamilton and I were talking about exactly this! Super excited to >>>>>> see it. I'd be more than happy to help out if you need it. >>>>>> >>>>>> Quick thoughts: >>>>>> - I wonder if we could provide more context at a glance than just >>>>>> green/red. Possibly a gradient of percentage success/failed per day? >>>>>> - I don't believe it should be the default view for a DAG as it is >>>>>> mainly a historical view rather than a recent view. >>>>>> >>>>>> - Brent >>>>>> >>>>>> >>>>>> On Mon, Apr 12, 2021 at 6:00 PM Benoit H <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> >>>>>>> I would like to share with you a proposal for the implementation of >>>>>>> a dag "calendar view" in the Airflow UI, which is a feature that I find >>>>>>> very useful when managing dags with a large number of dag runs. >>>>>>> >>>>>>> >>>>>>> The aim is to provide visibility over the full state of the dag by >>>>>>> displaying the aggregated dag runs' states in a calendar. >>>>>>> >>>>>>> Each day is displayed with a color according to the dag runs' states >>>>>>> for that day: >>>>>>> >>>>>>> - If at least one dag run has failed for a day, that day will be >>>>>>> displayed as "failed". >>>>>>> >>>>>>> - If all dag runs have succeeded the day will be shown as >>>>>>> "succeeded". >>>>>>> >>>>>>> - If there are still running dag runs (and no failed dag run) for >>>>>>> that day, the day will be shown as "running". >>>>>>> >>>>>>> Clicking on a day redirects to the tree view for that day. >>>>>>> >>>>>>> >>>>>>> This makes it possible to monitor the state of thousands of dag runs >>>>>>> in a single view that is concise and easy to understand. It is >>>>>>> particularly >>>>>>> useful to monitor the state of large backfills. >>>>>>> >>>>>>> >>>>>>> You may find screenshots, as well as additional details, in the >>>>>>> following Google doc: >>>>>>> https://docs.google.com/document/d/1fayWWbia7r1iPuHL23JeKJCP5JcKdOlHpLzrdAH0nT4/edit?usp=sharing >>>>>>> . >>>>>>> >>>>>>> A prototype implementation is available at >>>>>>> https://github.com/BenoitHanotte/airflow/pull/2/files. >>>>>>> >>>>>>> >>>>>>> I'd gladly get your feedback on the idea, and on whether it is worth >>>>>>> moving forward by creating an AIP to formalize this proposal. >>>>>>> >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> >>>>>>> Benoit Hanotte >>>>>>> >>>>>>
