Hi Daniel,

Are you talking about deprecating the decorator `@provide_bucket_name` or 
deprecating the decorator `@provide_bucket_name` AND the underlying feature of 
specifying a bucket through the connection? These are two different things, so 
please be specific. From your original email I understand you are talking about 
only the operators but from the last email I understand you want both.

If this is the latter, I agree with you that if we deprecate the decorator 
`provide_bucket_name`, we should also deprecate the underlying feature of 
specifying a bucket through the connection. What I was trying to explain in my 
email is we should not deprecate one and not the other. To the question "should 
we deprecate the feature of specifying a bucket through a connection", I dont 
have enough data to know how much this feature is used in Airflow. I am +0 on 
this one.

On 2024/07/03 21:40:12 Daniel Standish wrote:
> Thanks Vincent
> 
> Re provide_bucket_name
> 
> So, you've described what it does, but have not really made an argument
> that we should keep it.
> 
> There may be good arguments for keeping it.  Indeed, if it were an obvious
> choice to remove, I would not have brought it to the list!  But, the mere
> fact that it is there, is not really an argument for keeping it, because
> changing connection structure and hook or operator behavior is a routine
> thing that we can do in accordance with semver.
> 
> Our decisions here would possibly be aided if we tracked the usage of this
> kind of thing with scarf.  In lieu of that, one option is to deprecate it
> (raise a warning "this will be removed" when user is depending on the
> bucket in the connection) and then see how much push back we get from
> users...
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jul 3, 2024 at 11:25 AM Vincent Beck <vincb...@apache.org> wrote:
> 
> > Overall, I agree with Daniel that these decorators can be very confusing
> > (as user and maintainers).
> >
> > - `unify_bucket_name_and_key`. I am +1 to remove it. To me it does not
> > offer real value and is a just a "hack" to presumably make user life
> > easier. I prefer having a clear API with separate parameter for
> > `bucket_name` and `key` than some kind of magic where `key` can be a full
> > S3 URI or the file key.
> > - `provide_bucket_name`. This one I think we need to keep it. As part of
> > the AWS connection in Airflow, you can specify a S3 bucket:
> > https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#s3-bucket-configurations.
> > I am not entirely sure of the intent behind but users can specify buckets
> > as part of their connections. One (convenient way) for users then to use
> > this bucket information is to use the decorator `provide_bucket_name` which
> > then pass the bucket name down to the function decorated with
> > `provide_bucket_name`
> >
> > On 2024/07/02 09:51:38 Jarek Potiuk wrote:
> > > I think it's mostly internal Amazon/ Alibaba thing - and I'd defer and
> > > mostly follow suggestions of those who have most stakes there - i.e.
> > Amazon
> > > team for amazon provider, alibaba users/team for Alibaba one. I am also
> > +0
> > > on that one - sims like simplifying things is a good idea, also
> > decorators
> > > there are mostly for internal use - for operators (unless I am mistaken)
> > -
> > > so back-compatibility seems small.
> > > From the "Airflow team" perspective I think one thing is important - how
> > it
> > > interacts with FSSPEC - because as of recently it's the "common" way we
> > > suggest our users interact with object storages. If we can align it and
> > > make the "one way" following the Amazon's / Alibaba FSSPEC integration,
> > > that would be ideal. But I am not sure if this can be done.
> > >
> > > J.
> > >
> > >
> > >
> > >
> > > On Tue, Jul 2, 2024 at 10:26 AM Wei Lee <weilee...@gmail.com> wrote:
> > >
> > > > It looks like only Alibaba and Amazon providers use these decorators.
> > > > Removing them seems like simplifying things, but I’m not the actual
> > user of
> > > > these features. I would vote a +0 for removing them. We probably only
> > need
> > > > to bump the major version of these providers.
> > > >
> > > > Best,
> > > > Wei
> > > >
> > > > > On Jul 2, 2024, at 12:23 PM, Amogh Desai <amoghdesai....@gmail.com>
> > > > wrote:
> > > > >
> > > > > I understand that having multiple ways of doing the same thing is not
> > > > > needed and we need
> > > > > to slowly and eventually clean up these things to simplify the
> > codebase
> > > > and
> > > > > also make it
> > > > > maintainable.
> > > > >
> > > > > Aligning with one of the principles of Airflow3, we need to simplify
> > > > > things, nuke unnecessary
> > > > > stuff, remove deprecated code etc. This will have multiple benefits
> > for
> > > > the
> > > > > ecosystem:
> > > > > 1. Maintainability is not a headache. We will know for sure that
> > users
> > > > are
> > > > > using certain
> > > > > things in a certain way.
> > > > > 2. It will make it easier for us to make some decisions like this one
> > > > > without major discussions
> > > > > and edge case considerations because we know how users are using
> > certain
> > > > > patterns (aligns with
> > > > > point 1)
> > > > > 3. Easier for new contributors to contribute to :)
> > > > >
> > > > > I am all up for cleaning up the codebase if we have something of
> > > > > this nature. My 2c.
> > > > >
> > > > > Thanks & Regards,
> > > > > Amogh Desai
> > > > >
> > > > >
> > > > > On Tue, Jul 2, 2024 at 4:29 AM Daniel Standish
> > > > > <daniel.stand...@astronomer.io.invalid> wrote:
> > > > >
> > > > >> Sometimes in software, we try to be helpful by doing a bunch of
> > work for
> > > > >> the user, but it ends up causing a bunch of confusion and
> > maintenance
> > > > >> headaches.  I think these decorators are an example of that.
> > > > >>
> > > > >> What `unify_bucket_name_and_key` does is, it tries to make it so you
> > > > could
> > > > >> pass the full S3 URI to `key`, or you can provide them separately as
> > > > bucket
> > > > >> and key, and the decorator will "figure it out".
> > > > >>
> > > > >> Then this decorator can also be combined with another one,
> > > > >> @provide_bucket_name, which, if you don't provide the bucket name,
> > it
> > > > will
> > > > >> grab it from an airflow connection.
> > > > >>
> > > > >> Then, sometimes the params are again passed through another function
> > > > >> `get_s3_bucket_key`.
> > > > >>
> > > > >> This requires us to add tests to ensure that the decorators are
> > applied
> > > > in
> > > > >> the correct order, and now, to handle the various async cases (the
> > > > >> decorated fn may be either coroutine or async iterator).
> > > > >>
> > > > >> It's a bunch of code and complexity that, really does not need to
> > exist.
> > > > >> I.e. what if bucket were simply bucket, and key were simply key,
> > always?
> > > > >>
> > > > >> If it were up to me, I'd just remove all of it.  Just force the
> > user to
> > > > >> supply a bucket and key separately.
> > > > >>
> > > > >> We can of course do this since semver allows it.  But some portion
> > of
> > > > users
> > > > >> will be depending on this change and will have to change their code.
> > > > >>
> > > > >> Curious what others think.
> > > > >>
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to