Overall, I agree with Daniel that these decorators can be very confusing (as user and maintainers).
- `unify_bucket_name_and_key`. I am +1 to remove it. To me it does not offer real value and is a just a "hack" to presumably make user life easier. I prefer having a clear API with separate parameter for `bucket_name` and `key` than some kind of magic where `key` can be a full S3 URI or the file key. - `provide_bucket_name`. This one I think we need to keep it. As part of the AWS connection in Airflow, you can specify a S3 bucket: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#s3-bucket-configurations. I am not entirely sure of the intent behind but users can specify buckets as part of their connections. One (convenient way) for users then to use this bucket information is to use the decorator `provide_bucket_name` which then pass the bucket name down to the function decorated with `provide_bucket_name` On 2024/07/02 09:51:38 Jarek Potiuk wrote: > I think it's mostly internal Amazon/ Alibaba thing - and I'd defer and > mostly follow suggestions of those who have most stakes there - i.e. Amazon > team for amazon provider, alibaba users/team for Alibaba one. I am also +0 > on that one - sims like simplifying things is a good idea, also decorators > there are mostly for internal use - for operators (unless I am mistaken) - > so back-compatibility seems small. > From the "Airflow team" perspective I think one thing is important - how it > interacts with FSSPEC - because as of recently it's the "common" way we > suggest our users interact with object storages. If we can align it and > make the "one way" following the Amazon's / Alibaba FSSPEC integration, > that would be ideal. But I am not sure if this can be done. > > J. > > > > > On Tue, Jul 2, 2024 at 10:26 AM Wei Lee <weilee...@gmail.com> wrote: > > > It looks like only Alibaba and Amazon providers use these decorators. > > Removing them seems like simplifying things, but I’m not the actual user of > > these features. I would vote a +0 for removing them. We probably only need > > to bump the major version of these providers. > > > > Best, > > Wei > > > > > On Jul 2, 2024, at 12:23 PM, Amogh Desai <amoghdesai....@gmail.com> > > wrote: > > > > > > I understand that having multiple ways of doing the same thing is not > > > needed and we need > > > to slowly and eventually clean up these things to simplify the codebase > > and > > > also make it > > > maintainable. > > > > > > Aligning with one of the principles of Airflow3, we need to simplify > > > things, nuke unnecessary > > > stuff, remove deprecated code etc. This will have multiple benefits for > > the > > > ecosystem: > > > 1. Maintainability is not a headache. We will know for sure that users > > are > > > using certain > > > things in a certain way. > > > 2. It will make it easier for us to make some decisions like this one > > > without major discussions > > > and edge case considerations because we know how users are using certain > > > patterns (aligns with > > > point 1) > > > 3. Easier for new contributors to contribute to :) > > > > > > I am all up for cleaning up the codebase if we have something of > > > this nature. My 2c. > > > > > > Thanks & Regards, > > > Amogh Desai > > > > > > > > > On Tue, Jul 2, 2024 at 4:29 AM Daniel Standish > > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > >> Sometimes in software, we try to be helpful by doing a bunch of work for > > >> the user, but it ends up causing a bunch of confusion and maintenance > > >> headaches. I think these decorators are an example of that. > > >> > > >> What `unify_bucket_name_and_key` does is, it tries to make it so you > > could > > >> pass the full S3 URI to `key`, or you can provide them separately as > > bucket > > >> and key, and the decorator will "figure it out". > > >> > > >> Then this decorator can also be combined with another one, > > >> @provide_bucket_name, which, if you don't provide the bucket name, it > > will > > >> grab it from an airflow connection. > > >> > > >> Then, sometimes the params are again passed through another function > > >> `get_s3_bucket_key`. > > >> > > >> This requires us to add tests to ensure that the decorators are applied > > in > > >> the correct order, and now, to handle the various async cases (the > > >> decorated fn may be either coroutine or async iterator). > > >> > > >> It's a bunch of code and complexity that, really does not need to exist. > > >> I.e. what if bucket were simply bucket, and key were simply key, always? > > >> > > >> If it were up to me, I'd just remove all of it. Just force the user to > > >> supply a bucket and key separately. > > >> > > >> We can of course do this since semver allows it. But some portion of > > users > > >> will be depending on this change and will have to change their code. > > >> > > >> Curious what others think. > > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org