Well, thanks for forcing us to make the case :) I'll initiate a lazy consensus "vote"
On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vik...@astronomer.io.invalid> wrote: > Fair points all. > I withdraw my objection and agree with the consolidation proposal. > > Vikram > > > On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Let me also add to why I think we should consolidate to one param:: >> >> * "schedule_interval" >> * "schedule_on" >> * "timetable" >> >> Imagine you are a new user coming to Airflow. >> >> Do you immediately recognise which one is which ? Do you see what the >> difference is? >> >> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on" >> "dataset". >> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what? >> >> For me even if we wanted to have different names, the current set of >> names for those params is just terribly ambiguous. >> >> Making it "schedule" and acting depending on what you pass to it is so >> much better and simpler. >> >> J. >> >> >> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish >> <daniel.stand...@astronomer.io.invalid> wrote: >> >>> @Vikram, yeah, I understand that concern. Making a breaking change is >>> something we shouldn't do carelessly, without consideration, without good >>> reason. But I think we have a good reason here. >>> >>> Deprecations are somewhat regular in airflow. In 2.0 we moved a lot of >>> imports. We also changed the CLI semantics. Probably there were other >>> breaking changes that I'm forgetting *(granted sometimes we didn't >>> remove the backward-compatibility immediately in 2.0, and that remains an >>> option here)*. Probably just about every dag in existence had to be >>> changed then too. And that's what a major release is for. >>> >>> In 3.0, it looks like we're deprecating execution_date... That's even >>> bigger. So, were we to do this, (1) moving to `schedule` wouldn't be the >>> only notable breaking change and (2) by comparison it's a pretty simple >>> change. And if, as part of the upgrade to a major release, you have to do >>> renames and such anyway, the marginal cost of one more isn't massive. >>> >>> I guess my take is that we can't be chained to our previous decisions >>> forever. And we have to have the flexibility to adapt our interfaces so >>> that they still make sense and are consistent and don't cause excessive >>> eyebrow wrinkling. We have adopted semantic versioning to provide a >>> mechanism for signaling when breaking changes are coming. And we can add >>> tooling to help users along the way. >>> >>> Additionally, we can't *only *look at the burden of such a change for >>> users. We must also weigh this against the burden that we maintain by >>> *not* changing, i.e. by having too many params, or ill-named params. >>> >>> >>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <jedcunning...@apache.org> >>> wrote: >>> >>>> I'm not in favor of reusing `schedule_interval` for datasets (or >>>> folding `timetable` into it either), primarily because it doesn't feel like >>>> a good name to describe what it does. If we don't have the appetite to >>>> consolidate, my vote is we stick with 3 separate params. So preferably #1, >>>> if not then #2 for me. >>>> >>>> I'm less concerned about the "everyone has to update every DAG" issue, >>>> as it only becomes a problem for Airflow 3, folks can freely ignore it >>>> until then, and when they do update it it's a simple find/replace. >>>> >>>> >>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs < >>>> a.r.gibbs.massma...@gmail.com> wrote: >>>> >>>>> I'll just +1 that IMO a UI reminder is a great option. >>>>> >>>>> 2.3.0 added an alert when robots.txt was being scanned, which was >>>>> great, because I ignored it for a while, but eventually caved and looked >>>>> into it. This I would see as more of the same. I currently get loads of >>>>> warnings about things in my config file that have moved; but typically >>>>> only >>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see >>>>> the alert, think "I must fix those", and then the Homer effect kicks in, >>>>> and something else takes up the space in my brain. Having them as a banner >>>>> in the GUI would 100% get me to sort it sooner or later. >>>>> >>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka >>>>> <vik...@astronomer.io.invalid> wrote: >>>>> >>>>>> Hi Daniel, >>>>>> >>>>>> I completely agree with wanting to consolidate, but I am very >>>>>> concerned about people needing to change their existing code. >>>>>> As Ping and Felix also said, I worry that people ignore deprecation >>>>>> warnings, especially since the original code authors don't really look at >>>>>> logs or other pipeline artifacts unless something is broken. >>>>>> >>>>>> With respect to the options you outlined, I am personally leaning >>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on >>>>>> (AIP-48 >>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I >>>>>> could be convinced with (3), since I think that "timetable" is relatively >>>>>> recent and have not seen as much usage for this invocation yet. I am >>>>>> definitely curious about other perspectives on this as well. >>>>>> >>>>>> I have a hard time with option (1) of deprecating "schedule >>>>>> interval", because of the historical nature of it and the code changes >>>>>> that >>>>>> people may have to make to deal with it. >>>>>> >>>>>> Thanks for raising this topic and the discussion, >>>>>> >>>>>> Vikram >>>>>> >>>>>> >>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish >>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>> >>>>>>> I agree with those (including myself) who have suggested that it >>>>>>> would be great to have a deprecation fix tool. >>>>>>> >>>>>>> I also think that UI notifications would be valuable. Another >>>>>>> option would be to provide a less intrusive indicator somewhere (like >>>>>>> when >>>>>>> chrome lets you know you need to update), and then you click on it and >>>>>>> it >>>>>>> takes you to a page that lists out all the deprecation warnings. >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I know we are distracting a bit from the main subject, but I think >>>>>>>> it's worth it. >>>>>>>> >>>>>>>> I think this is a fact of life that people don't remove >>>>>>>> deprecations. But we have a choice of "not doing anything about it" and >>>>>>>> complaining. Or doing something that might improve it. >>>>>>>> >>>>>>>> I usually err on the side of "doing something" when I see a >>>>>>>> problem. I am a doer :). >>>>>>>> >>>>>>>> And I have a possibly controversial proposal. I got the inspiration >>>>>>>> from node.js old versions. Whenever an old version of Node is being >>>>>>>> used, >>>>>>>> you have a 20 seconds delay artificially added and information that in >>>>>>>> order to remove those 20 seconds you need to upgrade. >>>>>>>> >>>>>>>> Why don't we do that for some really old/serious deprecations ? >>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the >>>>>>>> UI if >>>>>>>> you have any deprecation unhandled. Something that is annoying enough >>>>>>>> that >>>>>>>> will cause people to make an action but not intrusive enough to break >>>>>>>> things. >>>>>>>> >>>>>>>> I think this is a very basic psychological trait: >>>>>>>> >>>>>>>> If it does not cost the user anything to leave the deprecations, >>>>>>>> there will be a big number of people who will do that. >>>>>>>> We need to make sure that deprecations are costing our users "a >>>>>>>> little" and that they are aware that they can remove the cost (and >>>>>>>> know how >>>>>>>> - this is super important that we only show such a message with clear >>>>>>>> instructions on what to do). >>>>>>>> >>>>>>>> Such a message might be annoying in many ways: >>>>>>>> >>>>>>>> * it can obstruct significant part of the screen >>>>>>>> * It can be flashy and had very annoying colours (think bold pink >>>>>>>> fonts on flash-green background for example) >>>>>>>> * It can not allow to do anything before the user closes it with a >>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie >>>>>>>> banners). >>>>>>>> And when they click they can be greeted with "You know - you can >>>>>>>> actually >>>>>>>> remove this warning by doing THIS". >>>>>>>> * and we have probably more options - our UX specialists might have >>>>>>>> some good ideas here >>>>>>>> >>>>>>>> For anyone who complains about "people do not act on deprecation." >>>>>>>> - yeah this is true. >>>>>>>> But one universal truth there is that if we won't act, this will >>>>>>>> change nothing in the behaviour of our users - this is 100% certain. >>>>>>>> >>>>>>>> If we do something - it might improve it. We will not get a 100% >>>>>>>> success for sure, but we can at least do something to improve it. >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> J. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall >>>>>>>> <felue...@pm.me.invalid> wrote: >>>>>>>> >>>>>>>>> Hi Ping, >>>>>>>>> >>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I >>>>>>>>> started the airflint project. >>>>>>>>> I totally agree with what you said that often people ignore these >>>>>>>>> warnings and therefore a cli tool would come in very handy. It could >>>>>>>>> look >>>>>>>>> similar to the airflint project and also be based on the python >>>>>>>>> refactor >>>>>>>>> package, but I would keep it out of this discussion for now. >>>>>>>>> >>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there >>>>>>>>> but also tend to having one schedule arg, because of simplicity >>>>>>>>> for the dag author. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Felix >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Sent from Proton Mail for iOS >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote: >>>>>>>>> >>>>>>>>> Hi Daniel, >>>>>>>>> >>>>>>>>> +1 for ‘schedule' and thanks for bringing this up. >>>>>>>>> >>>>>>>>> I agree with Vikram, we should be very careful about deprecating >>>>>>>>> existing params even if we have warnings around it. Not sure if this >>>>>>>>> is a >>>>>>>>> general case, but I notice that people ignore warnings all the time >>>>>>>>> until >>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool >>>>>>>>> to >>>>>>>>> help users to batch update these deprecating params just like python's >>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the >>>>>>>>> airflow >>>>>>>>> version. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Ping >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish >>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>>> >>>>>>>>>> In case you forgot about us @Vikram, gentle nudge >>>>>>>>>> >>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish < >>>>>>>>>> daniel.stand...@astronomer.io> wrote: >>>>>>>>>> >>>>>>>>>>> Well, this is a discussion thread, so let's discuss! >>>>>>>>>>> >>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you >>>>>>>>>>> could provide a little more detail about your concerns? >>>>>>>>>>> >>>>>>>>>>> There are certainly alternatives, and of course each of them has >>>>>>>>>>> tradeoffs. >>>>>>>>>>> >>>>>>>>>>> Here are the options I see: >>>>>>>>>>> >>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and >>>>>>>>>>> timetable; the proposal under discussion in this thread) >>>>>>>>>>> 2. do nothing (keep 3 different scheduling params) >>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable) >>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has >>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval` >>>>>>>>>>> >>>>>>>>>>> Which would you propose we go with? >>>>>>>>>>> >>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical >>>>>>>>>>> territory since this is still a discussion thread), my vote would >>>>>>>>>>> definitely be to go with 3. >>>>>>>>>>> >>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, >>>>>>>>>>> but `schedule` definitely seems better. From a new user's >>>>>>>>>>> perspective, >>>>>>>>>>> having misleading names is confusing, and can be a negative when >>>>>>>>>>> looking at >>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing >>>>>>>>>>> with >>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, >>>>>>>>>>> though >>>>>>>>>>> it can be bothersome, certainly. >>>>>>>>>>> >>>>>>>>>>> Just to look at / compare feasibility. >>>>>>>>>>> >>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than >>>>>>>>>>> some other deprecations because it's a DAG param. That said, I >>>>>>>>>>> think it's >>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality >>>>>>>>>>> of DAG >>>>>>>>>>> and not *task*, so there is less to change with than some other >>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace >>>>>>>>>>> that >>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to >>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail >>>>>>>>>>> the DAG >>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment >>>>>>>>>>> with a >>>>>>>>>>> deprecation fix tool, so that users could get help fixing >>>>>>>>>>> deprecations, not >>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking >>>>>>>>>>> at 3.0 >>>>>>>>>>> any time soon. >>>>>>>>>>> >>>>>>>>>>> We could compare with execution_date, if execution_date is to be >>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a >>>>>>>>>>> more >>>>>>>>>>> difficult thing for users to handle. For one, potentially higher >>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of >>>>>>>>>>> execution_date >>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere >>>>>>>>>>> in >>>>>>>>>>> custom operators. *(With schedule_interval, since it's a >>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date >>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding >>>>>>>>>>> user >>>>>>>>>>> confusion. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka >>>>>>>>>>> <vik...@astronomer.io.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> -1 from me. >>>>>>>>>>>> >>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I >>>>>>>>>>>> don't think we should be doing this yet until we understand the >>>>>>>>>>>> implications completely. >>>>>>>>>>>> I am really not in favor of deprecation of the existing params, >>>>>>>>>>>> unless there is really no alternative. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish >>>>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> So far, seems all in favor. >>>>>>>>>>>>> >>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we >>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in >>>>>>>>>>>>> existence will >>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically >>>>>>>>>>>>> every dag >>>>>>>>>>>>> in existence will need to be updated. >>>>>>>>>>>>> >>>>>>>>>>>>> Thankfully, for most people, this should be an easy update >>>>>>>>>>>>> since DAG is kwargs only, IIRC. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi >>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1 to consolidating and "schedule". >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl >>>>>>>>>>>>>> <drew.h...@astronomer.io.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of >>>>>>>>>>>>>>> consolidating to one being more important than the name of that >>>>>>>>>>>>>>> one >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell < >>>>>>>>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 >>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham < >>>>>>>>>>>>>>> jedcunning...@apache.org> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>