> > Are you talking about deprecating the decorator `@provide_bucket_name` or > deprecating the decorator `@provide_bucket_name` AND the underlying feature > of specifying a bucket through the connection?
Yes! Sorry, I did not make that clear. And yeah, that's probably a better way to frame this discussion. The proposal is (1) don't apply "unify bucket and key logic" for user and (2) don't "provide bucket name" from connection anymore -- force users to provide them explicitly when using the hook directly. I was not thinking that we would deprecate the decorator and keep the feature. But that's an interesting thought. And it makes me think of another option that would potentially reduce pain for users. That is, we could choose to preserve the functionality at the operator level; if any s3 operators were relying on this functionality, then we could keep the logic so that the operator would still check the connection for the "default bucket". Doing this kind of thing at the operator level does seem a little less problematic to me. And perhaps being more explicit about it in the code, and perhaps even logging "falling back to default bucket" at runtime. To the question "should we deprecate the feature of specifying a bucket > through a connection", I dont have enough data to know how much this > feature is used in Airflow. I am +0 on this one. Yeah I don't have data either. I think we can assume it's not negligible. Like, small but significant would be my guess. Just thinking about the feature and it's value... As a data engineer, long ago, I knew about this behavior but did not really like it and preferred being explicit about the bucket being used in any particular operation (as opposed to leaving it to the connection). I preferred that the connection be about auth and the bucket / key be specified explicitly. It feels a little iffy to leave it to the connection. And, in practice, often you will use the same creds / connection for many different buckets, so I don't think it's really all that helpful. But there are many different viewpoints on this type of thing. On Thu, Jul 4, 2024 at 7:47 AM Vincent Beck <vincb...@apache.org> wrote: > Hi Daniel, > > Are you talking about deprecating the decorator `@provide_bucket_name` or > deprecating the decorator `@provide_bucket_name` AND the underlying feature > of specifying a bucket through the connection? These are two different > things, so please be specific. From your original email I understand you > are talking about only the operators but from the last email I understand > you want both. > > If this is the latter, I agree with you that if we deprecate the decorator > `provide_bucket_name`, we should also deprecate the underlying feature of > specifying a bucket through the connection. What I was trying to explain in > my email is we should not deprecate one and not the other. To the question > "should we deprecate the feature of specifying a bucket through a > connection", I dont have enough data to know how much this feature is used > in Airflow. I am +0 on this one. > > On 2024/07/03 21:40:12 Daniel Standish wrote: > > Thanks Vincent > > > > Re provide_bucket_name > > > > So, you've described what it does, but have not really made an argument > > that we should keep it. > > > > There may be good arguments for keeping it. Indeed, if it were an > obvious > > choice to remove, I would not have brought it to the list! But, the mere > > fact that it is there, is not really an argument for keeping it, because > > changing connection structure and hook or operator behavior is a routine > > thing that we can do in accordance with semver. > > > > Our decisions here would possibly be aided if we tracked the usage of > this > > kind of thing with scarf. In lieu of that, one option is to deprecate it > > (raise a warning "this will be removed" when user is depending on the > > bucket in the connection) and then see how much push back we get from > > users... > > > > > > > > > > > > > > > > On Wed, Jul 3, 2024 at 11:25 AM Vincent Beck <vincb...@apache.org> > wrote: > > > > > Overall, I agree with Daniel that these decorators can be very > confusing > > > (as user and maintainers). > > > > > > - `unify_bucket_name_and_key`. I am +1 to remove it. To me it does not > > > offer real value and is a just a "hack" to presumably make user life > > > easier. I prefer having a clear API with separate parameter for > > > `bucket_name` and `key` than some kind of magic where `key` can be a > full > > > S3 URI or the file key. > > > - `provide_bucket_name`. This one I think we need to keep it. As part > of > > > the AWS connection in Airflow, you can specify a S3 bucket: > > > > https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#s3-bucket-configurations > . > > > I am not entirely sure of the intent behind but users can specify > buckets > > > as part of their connections. One (convenient way) for users then to > use > > > this bucket information is to use the decorator `provide_bucket_name` > which > > > then pass the bucket name down to the function decorated with > > > `provide_bucket_name` > > > > > > On 2024/07/02 09:51:38 Jarek Potiuk wrote: > > > > I think it's mostly internal Amazon/ Alibaba thing - and I'd defer > and > > > > mostly follow suggestions of those who have most stakes there - i.e. > > > Amazon > > > > team for amazon provider, alibaba users/team for Alibaba one. I am > also > > > +0 > > > > on that one - sims like simplifying things is a good idea, also > > > decorators > > > > there are mostly for internal use - for operators (unless I am > mistaken) > > > - > > > > so back-compatibility seems small. > > > > From the "Airflow team" perspective I think one thing is important - > how > > > it > > > > interacts with FSSPEC - because as of recently it's the "common" way > we > > > > suggest our users interact with object storages. If we can align it > and > > > > make the "one way" following the Amazon's / Alibaba FSSPEC > integration, > > > > that would be ideal. But I am not sure if this can be done. > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > On Tue, Jul 2, 2024 at 10:26 AM Wei Lee <weilee...@gmail.com> wrote: > > > > > > > > > It looks like only Alibaba and Amazon providers use these > decorators. > > > > > Removing them seems like simplifying things, but I’m not the actual > > > user of > > > > > these features. I would vote a +0 for removing them. We probably > only > > > need > > > > > to bump the major version of these providers. > > > > > > > > > > Best, > > > > > Wei > > > > > > > > > > > On Jul 2, 2024, at 12:23 PM, Amogh Desai < > amoghdesai....@gmail.com> > > > > > wrote: > > > > > > > > > > > > I understand that having multiple ways of doing the same thing > is not > > > > > > needed and we need > > > > > > to slowly and eventually clean up these things to simplify the > > > codebase > > > > > and > > > > > > also make it > > > > > > maintainable. > > > > > > > > > > > > Aligning with one of the principles of Airflow3, we need to > simplify > > > > > > things, nuke unnecessary > > > > > > stuff, remove deprecated code etc. This will have multiple > benefits > > > for > > > > > the > > > > > > ecosystem: > > > > > > 1. Maintainability is not a headache. We will know for sure that > > > users > > > > > are > > > > > > using certain > > > > > > things in a certain way. > > > > > > 2. It will make it easier for us to make some decisions like > this one > > > > > > without major discussions > > > > > > and edge case considerations because we know how users are using > > > certain > > > > > > patterns (aligns with > > > > > > point 1) > > > > > > 3. Easier for new contributors to contribute to :) > > > > > > > > > > > > I am all up for cleaning up the codebase if we have something of > > > > > > this nature. My 2c. > > > > > > > > > > > > Thanks & Regards, > > > > > > Amogh Desai > > > > > > > > > > > > > > > > > > On Tue, Jul 2, 2024 at 4:29 AM Daniel Standish > > > > > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > > > > > > > >> Sometimes in software, we try to be helpful by doing a bunch of > > > work for > > > > > >> the user, but it ends up causing a bunch of confusion and > > > maintenance > > > > > >> headaches. I think these decorators are an example of that. > > > > > >> > > > > > >> What `unify_bucket_name_and_key` does is, it tries to make it > so you > > > > > could > > > > > >> pass the full S3 URI to `key`, or you can provide them > separately as > > > > > bucket > > > > > >> and key, and the decorator will "figure it out". > > > > > >> > > > > > >> Then this decorator can also be combined with another one, > > > > > >> @provide_bucket_name, which, if you don't provide the bucket > name, > > > it > > > > > will > > > > > >> grab it from an airflow connection. > > > > > >> > > > > > >> Then, sometimes the params are again passed through another > function > > > > > >> `get_s3_bucket_key`. > > > > > >> > > > > > >> This requires us to add tests to ensure that the decorators are > > > applied > > > > > in > > > > > >> the correct order, and now, to handle the various async cases > (the > > > > > >> decorated fn may be either coroutine or async iterator). > > > > > >> > > > > > >> It's a bunch of code and complexity that, really does not need > to > > > exist. > > > > > >> I.e. what if bucket were simply bucket, and key were simply key, > > > always? > > > > > >> > > > > > >> If it were up to me, I'd just remove all of it. Just force the > > > user to > > > > > >> supply a bucket and key separately. > > > > > >> > > > > > >> We can of course do this since semver allows it. But some > portion > > > of > > > > > users > > > > > >> will be depending on this change and will have to change their > code. > > > > > >> > > > > > >> Curious what others think. > > > > > >> > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >