Well, thanks for forcing us to make the case :)

I'll initiate a lazy consensus "vote"


On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vik...@astronomer.io.invalid>
wrote:

> Fair points all.
> I withdraw my objection and agree with the consolidation proposal.
>
> Vikram
>
>
> On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Let me also add to why I think we should consolidate to one param::
>>
>> * "schedule_interval"
>> * "schedule_on"
>> * "timetable"
>>
>> Imagine you are a new user coming to Airflow.
>>
>> Do you immediately recognise which one is which ? Do you see what the
>> difference is?
>>
>> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
>> "dataset".
>> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?
>>
>> For me even if we wanted to have different names, the current set of
>> names for those params is just terribly ambiguous.
>>
>> Making it "schedule" and acting depending on what you pass to it is so
>> much better and simpler.
>>
>> J.
>>
>>
>> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
>> <daniel.stand...@astronomer.io.invalid> wrote:
>>
>>> @Vikram, yeah, I understand that concern.  Making a breaking change is
>>> something we shouldn't do carelessly, without consideration, without good
>>> reason.  But I think we have a good reason here.
>>>
>>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
>>> imports.  We also changed the CLI semantics.  Probably there were other
>>> breaking changes that I'm forgetting *(granted sometimes we didn't
>>> remove the backward-compatibility immediately in 2.0, and that remains an
>>> option here)*.  Probably just about every dag in existence had to be
>>> changed then too.  And that's what a major release is for.
>>>
>>> In 3.0, it looks like we're deprecating execution_date...  That's even
>>> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
>>> only notable breaking change and (2) by comparison it's a pretty simple
>>> change.  And if, as part of the upgrade to a major release, you have to do
>>> renames and such anyway, the marginal cost of one more isn't massive.
>>>
>>> I guess my take is that we can't be chained to our previous decisions
>>> forever.  And we have to have the flexibility to adapt our interfaces so
>>> that they still make sense and are consistent and don't cause excessive
>>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>>> mechanism for signaling when breaking changes are coming.  And we can add
>>> tooling to help users along the way.
>>>
>>> Additionally, we can't *only *look at the burden of such a change for
>>> users. We must also weigh this against the burden that we maintain by
>>> *not* changing, i.e. by having too many params, or ill-named params.
>>>
>>>
>>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <jedcunning...@apache.org>
>>> wrote:
>>>
>>>> I'm not in favor of reusing `schedule_interval` for datasets (or
>>>> folding `timetable` into it either), primarily because it doesn't feel like
>>>> a good name to describe what it does. If we don't have the appetite to
>>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>>> if not then #2 for me.
>>>>
>>>> I'm less concerned about the "everyone has to update every DAG" issue,
>>>> as it only becomes a problem for Airflow 3, folks can freely ignore it
>>>> until then, and when they do update it it's a simple find/replace.
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>>> a.r.gibbs.massma...@gmail.com> wrote:
>>>>
>>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>>
>>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>>> great, because I ignored it for a while, but eventually caved and looked
>>>>> into it. This I would see as more of the same. I currently get loads of
>>>>> warnings about things in my config file that have moved; but typically 
>>>>> only
>>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>>> and something else takes up the space in my brain. Having them as a banner
>>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>>
>>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka
>>>>> <vik...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> I completely agree with wanting to consolidate, but I am very
>>>>>> concerned about people needing to change their existing code.
>>>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>>>> warnings, especially since the original code authors don't really look at
>>>>>> logs or other pipeline artifacts unless something is broken.
>>>>>>
>>>>>> With respect to the options you outlined, I am personally leaning
>>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on 
>>>>>> (AIP-48
>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>>> definitely curious about other perspectives on this as well.
>>>>>>
>>>>>> I have a hard time with option (1) of deprecating "schedule
>>>>>> interval", because of the historical nature of it and the code changes 
>>>>>> that
>>>>>> people may have to make to deal with it.
>>>>>>
>>>>>> Thanks for raising this topic and the discussion,
>>>>>>
>>>>>> Vikram
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> I agree with those (including myself) who have suggested that it
>>>>>>> would be great to have a deprecation fix tool.
>>>>>>>
>>>>>>> I also think that UI notifications would be valuable.  Another
>>>>>>> option would be to provide a less intrusive indicator somewhere (like 
>>>>>>> when
>>>>>>> chrome lets you know you need to update), and then you click on it and 
>>>>>>> it
>>>>>>> takes you to a page that lists out all the deprecation warnings.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I know we are distracting a bit from the main subject, but I think
>>>>>>>> it's worth it.
>>>>>>>>
>>>>>>>> I think this is a fact of life that people don't remove
>>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>>
>>>>>>>> I usually err on the side of "doing something" when I see a
>>>>>>>> problem. I am a doer :).
>>>>>>>>
>>>>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>>>>> from node.js old versions. Whenever an old version of Node is being 
>>>>>>>> used,
>>>>>>>> you have a 20 seconds delay artificially added and information that in
>>>>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>>>>
>>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the 
>>>>>>>> UI if
>>>>>>>> you have any deprecation unhandled. Something that is annoying enough 
>>>>>>>> that
>>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>>> things.
>>>>>>>>
>>>>>>>> I think this is a very basic psychological trait:
>>>>>>>>
>>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>>> there will be a big number of people who will do that.
>>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>>> little" and that they are aware that they can remove the cost (and 
>>>>>>>> know how
>>>>>>>> - this is super important that we only show such a message with clear
>>>>>>>> instructions on what to do).
>>>>>>>>
>>>>>>>> Such a message might be annoying in many ways:
>>>>>>>>
>>>>>>>> * it can obstruct significant part of the screen
>>>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>>>> fonts on flash-green background for example)
>>>>>>>> * It can not allow to do anything before the user closes it with a
>>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie 
>>>>>>>> banners).
>>>>>>>> And when they click they can be greeted with "You know - you can 
>>>>>>>> actually
>>>>>>>> remove this warning by doing THIS".
>>>>>>>> * and we have probably more options - our UX specialists might have
>>>>>>>> some good ideas here
>>>>>>>>
>>>>>>>> For anyone who complains about "people do not act on deprecation."
>>>>>>>> - yeah this is true.
>>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>>
>>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>>> <felue...@pm.me.invalid> wrote:
>>>>>>>>
>>>>>>>>> Hi Ping,
>>>>>>>>>
>>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>>>>> started the airflint project.
>>>>>>>>> I totally agree with what you said that often people ignore these
>>>>>>>>> warnings and therefore a cli tool would come in very handy. It could 
>>>>>>>>> look
>>>>>>>>> similar to the airflint project and also be based on the python 
>>>>>>>>> refactor
>>>>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>>>>
>>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there
>>>>>>>>> but also tend to having one schedule arg, because of simplicity
>>>>>>>>> for the dag author.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Felix
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote:
>>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>>
>>>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>>>> existing params even if we have warnings around it. Not sure if this 
>>>>>>>>> is a
>>>>>>>>> general case, but I notice that people ignore warnings all the time 
>>>>>>>>> until
>>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool 
>>>>>>>>> to
>>>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the 
>>>>>>>>> airflow
>>>>>>>>> version.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Ping
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>>> daniel.stand...@astronomer.io> wrote:
>>>>>>>>>>
>>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>>
>>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>>>>
>>>>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>>>>> tradeoffs.
>>>>>>>>>>>
>>>>>>>>>>> Here are the options I see:
>>>>>>>>>>>
>>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>>
>>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>>
>>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>>
>>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>>> but `schedule` definitely seems better. From a new user's 
>>>>>>>>>>> perspective,
>>>>>>>>>>> having misleading names is confusing, and can be a negative when 
>>>>>>>>>>> looking at
>>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing 
>>>>>>>>>>> with
>>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, 
>>>>>>>>>>> though
>>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>>
>>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>>
>>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>>>> some other deprecations because it's a DAG param. That said, I 
>>>>>>>>>>> think it's
>>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality 
>>>>>>>>>>> of DAG
>>>>>>>>>>> and not *task*, so there is less to change with than some other
>>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace 
>>>>>>>>>>> that
>>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail 
>>>>>>>>>>> the DAG
>>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment 
>>>>>>>>>>> with a
>>>>>>>>>>> deprecation fix tool, so that users could get help fixing 
>>>>>>>>>>> deprecations, not
>>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking 
>>>>>>>>>>> at 3.0
>>>>>>>>>>> any time soon.
>>>>>>>>>>>
>>>>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a 
>>>>>>>>>>> more
>>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of 
>>>>>>>>>>> execution_date
>>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere 
>>>>>>>>>>> in
>>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date
>>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding 
>>>>>>>>>>> user
>>>>>>>>>>> confusion.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>>> <vik...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> -1 from me.
>>>>>>>>>>>>
>>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>>>> implications completely.
>>>>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>>>>> unless there is really no alternative.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in 
>>>>>>>>>>>>> existence will
>>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically 
>>>>>>>>>>>>> every dag
>>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>>> <drew.h...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>>> consolidating to one being more important than the name of that 
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>>> jedcunning...@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Reply via email to