Hi Daniel, Are you talking about deprecating the decorator `@provide_bucket_name` or deprecating the decorator `@provide_bucket_name` AND the underlying feature of specifying a bucket through the connection? These are two different things, so please be specific. From your original email I understand you are talking about only the operators but from the last email I understand you want both.
If this is the latter, I agree with you that if we deprecate the decorator `provide_bucket_name`, we should also deprecate the underlying feature of specifying a bucket through the connection. What I was trying to explain in my email is we should not deprecate one and not the other. To the question "should we deprecate the feature of specifying a bucket through a connection", I dont have enough data to know how much this feature is used in Airflow. I am +0 on this one. On 2024/07/03 21:40:12 Daniel Standish wrote: > Thanks Vincent > > Re provide_bucket_name > > So, you've described what it does, but have not really made an argument > that we should keep it. > > There may be good arguments for keeping it. Indeed, if it were an obvious > choice to remove, I would not have brought it to the list! But, the mere > fact that it is there, is not really an argument for keeping it, because > changing connection structure and hook or operator behavior is a routine > thing that we can do in accordance with semver. > > Our decisions here would possibly be aided if we tracked the usage of this > kind of thing with scarf. In lieu of that, one option is to deprecate it > (raise a warning "this will be removed" when user is depending on the > bucket in the connection) and then see how much push back we get from > users... > > > > > > > > On Wed, Jul 3, 2024 at 11:25 AM Vincent Beck <vincb...@apache.org> wrote: > > > Overall, I agree with Daniel that these decorators can be very confusing > > (as user and maintainers). > > > > - `unify_bucket_name_and_key`. I am +1 to remove it. To me it does not > > offer real value and is a just a "hack" to presumably make user life > > easier. I prefer having a clear API with separate parameter for > > `bucket_name` and `key` than some kind of magic where `key` can be a full > > S3 URI or the file key. > > - `provide_bucket_name`. This one I think we need to keep it. As part of > > the AWS connection in Airflow, you can specify a S3 bucket: > > https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#s3-bucket-configurations. > > I am not entirely sure of the intent behind but users can specify buckets > > as part of their connections. One (convenient way) for users then to use > > this bucket information is to use the decorator `provide_bucket_name` which > > then pass the bucket name down to the function decorated with > > `provide_bucket_name` > > > > On 2024/07/02 09:51:38 Jarek Potiuk wrote: > > > I think it's mostly internal Amazon/ Alibaba thing - and I'd defer and > > > mostly follow suggestions of those who have most stakes there - i.e. > > Amazon > > > team for amazon provider, alibaba users/team for Alibaba one. I am also > > +0 > > > on that one - sims like simplifying things is a good idea, also > > decorators > > > there are mostly for internal use - for operators (unless I am mistaken) > > - > > > so back-compatibility seems small. > > > From the "Airflow team" perspective I think one thing is important - how > > it > > > interacts with FSSPEC - because as of recently it's the "common" way we > > > suggest our users interact with object storages. If we can align it and > > > make the "one way" following the Amazon's / Alibaba FSSPEC integration, > > > that would be ideal. But I am not sure if this can be done. > > > > > > J. > > > > > > > > > > > > > > > On Tue, Jul 2, 2024 at 10:26 AM Wei Lee <weilee...@gmail.com> wrote: > > > > > > > It looks like only Alibaba and Amazon providers use these decorators. > > > > Removing them seems like simplifying things, but I’m not the actual > > user of > > > > these features. I would vote a +0 for removing them. We probably only > > need > > > > to bump the major version of these providers. > > > > > > > > Best, > > > > Wei > > > > > > > > > On Jul 2, 2024, at 12:23 PM, Amogh Desai <amoghdesai....@gmail.com> > > > > wrote: > > > > > > > > > > I understand that having multiple ways of doing the same thing is not > > > > > needed and we need > > > > > to slowly and eventually clean up these things to simplify the > > codebase > > > > and > > > > > also make it > > > > > maintainable. > > > > > > > > > > Aligning with one of the principles of Airflow3, we need to simplify > > > > > things, nuke unnecessary > > > > > stuff, remove deprecated code etc. This will have multiple benefits > > for > > > > the > > > > > ecosystem: > > > > > 1. Maintainability is not a headache. We will know for sure that > > users > > > > are > > > > > using certain > > > > > things in a certain way. > > > > > 2. It will make it easier for us to make some decisions like this one > > > > > without major discussions > > > > > and edge case considerations because we know how users are using > > certain > > > > > patterns (aligns with > > > > > point 1) > > > > > 3. Easier for new contributors to contribute to :) > > > > > > > > > > I am all up for cleaning up the codebase if we have something of > > > > > this nature. My 2c. > > > > > > > > > > Thanks & Regards, > > > > > Amogh Desai > > > > > > > > > > > > > > > On Tue, Jul 2, 2024 at 4:29 AM Daniel Standish > > > > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > > > > > >> Sometimes in software, we try to be helpful by doing a bunch of > > work for > > > > >> the user, but it ends up causing a bunch of confusion and > > maintenance > > > > >> headaches. I think these decorators are an example of that. > > > > >> > > > > >> What `unify_bucket_name_and_key` does is, it tries to make it so you > > > > could > > > > >> pass the full S3 URI to `key`, or you can provide them separately as > > > > bucket > > > > >> and key, and the decorator will "figure it out". > > > > >> > > > > >> Then this decorator can also be combined with another one, > > > > >> @provide_bucket_name, which, if you don't provide the bucket name, > > it > > > > will > > > > >> grab it from an airflow connection. > > > > >> > > > > >> Then, sometimes the params are again passed through another function > > > > >> `get_s3_bucket_key`. > > > > >> > > > > >> This requires us to add tests to ensure that the decorators are > > applied > > > > in > > > > >> the correct order, and now, to handle the various async cases (the > > > > >> decorated fn may be either coroutine or async iterator). > > > > >> > > > > >> It's a bunch of code and complexity that, really does not need to > > exist. > > > > >> I.e. what if bucket were simply bucket, and key were simply key, > > always? > > > > >> > > > > >> If it were up to me, I'd just remove all of it. Just force the > > user to > > > > >> supply a bucket and key separately. > > > > >> > > > > >> We can of course do this since semver allows it. But some portion > > of > > > > users > > > > >> will be depending on this change and will have to change their code. > > > > >> > > > > >> Curious what others think. > > > > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org