This AIP will have a great positive impact on the project:
- Airflow will be increasingly used as a scheduler for ML projects.
- Simplifying the file transfer operators by replacing them with a single
one for all the file/object storage services.
- Implementing and managing the CI/CD pipelines for Airflow DAGs will be
much easier once we use the new feature in the DAGs processor (out of the
scope of this AIP but will be possible).
- Opportunity to support a generic XCom backend based on AFS, helpful in
sharing big files between the tasks and creating dynamic tasks from files
(out of the scope of this AIP but will be possible).

+1 (binding)

My only concern is the stability of fsspec packages; I had a bad experience
with s3fs and gcsfs in the past due to patch/minor releases with breaking
changes or conflict with botocore for s3fs, hope release management has
improved since.

On Fri, Oct 20, 2023 at 4:33 PM Bolke de Bruin <bdbr...@gmail.com> wrote:

> I have added an example for the use of the FileTransferOperator in the PR.
> This is a 'port' of the local_to_s3 dag that is used elsewhere in the
> examples. I kept the structure as per that original, but it could be
> reduced to a two-liner (in dag-speak).
>
> I agree with Jens that the PR needs to settle a bit; more on the
> implementation rather than the API imho - the API mostly comes from
> pathlib.Path + fsspec extensions (but please shoot at it!). I hope we can
> consider it 'settled enough' by the time this vote ends.
>
> Cheers
> Bolke
>
> On Fri, 20 Oct 2023 at 09:43, Scheffler Jens (XC-DX/ETV5)
> <jens.scheff...@de.bosch.com.invalid> wrote:
>
> > +1 (non binding) for making this AIP in general.
> >
> > I had a couple of comments and the rework and comments are very active. I
> > assume the PR needs to settle for a moment and there still a lot of
> > different opinions - which is fair with the given complexity. The value
> is
> > very high but I fear a bit that we nail down the API a bit too fast. But
> it
> > is a feature that will need to stay and we need to make it "right". So I
> > propose either the PR stays for a moment to mature or we need to mark the
> > feature at least for one version to be "experimental" --> to have the
> > ability to adjust API if we learn in real life - not being "locked" into
> > API v1 for years.
> >
> > I also would like to see examples, but maybe I need to catch-up with all
> > the ongoing changes as well.
> >
> > THANKS for the efforts and the concepts Bolke!
> >
> > Mit freundlichen Grüßen / Best regards
> >
> > Jens Scheffler
> >
> > Deterministik open Loop (XC-DX/ETV5)
> > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> > GERMANY | http://www.bosch.com/
> > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > jens.scheff...@de.bosch.com
> >
> > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > Geschäftsführung: Dr. Stefan Hartung,
> > Dr. Christian Fischer, Dr. Markus Forschner, Stefan Grosch, Dr. Markus
> > Heyn, Dr. Tanja Rückert
> >
> > -----Original Message-----
> > From: Kaxil Naik <kaxiln...@gmail.com>
> > Sent: Freitag, 20. Oktober 2023 01:00
> > To: dev@airflow.apache.org
> > Subject: Re: [VOTE] AIP-58 Airflow ObjectStore
> >
> > I like where this is heading, so I vote *+1*.
> >
> > Although, I would like to see some examples of usage in DAGs
> (before/after
> > would be great) that will help support the following points that you have
> > mentioned in the AIP <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whyisitneeded
> > ?>
> > :
> >
> >    1. Simplify DAG CI/CD
> >    2. Streamlining pre-DAG to DAG (e.g. notebooks to DAG)
> >    3. To allow DAG processing to be using arbitrary locations (object
> >    storage)
> >    4. To have a unified interface to file operations in TaskFlow and
> >    traditional Operators
> >
> > and some comments:
> >
> >    1. You do have *lineage* listed in the image
> >    <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whatchangedoyouproposetomake
> > ?>,
> >    but is it a follow-up work that you were thinking or was it part of
> AIP
> >    completion?
> >    2. We would contribute the File abstraction as a follow-up to this AIP
> >    too, which will help with the Dataset story too
> >
> >
> > Regards,
> > Kaxil
> >
> > On Thu, 19 Oct 2023 at 20:21, Bolke de Bruin <bdbr...@gmail.com> wrote:
> >
> > > I dont mind waiting for that given a reasonable timeframe. Martin
> > > mentioned he wanted to do something at the end of the week. The vote
> > > to this AIP runs until next Thursday anyway :-).
> > >
> > > And thank you :-).
> > >
> > > B.
> > >
> > > On Thu, 19 Oct 2023 at 21:11, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > > > > One less worry I hope is that aiobotocore is actually starting to
> > > > > relax
> > > > its botocore requirements bringing it much closer to latest release:
> > > > https://gi/
> > > > thub.com%2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJens.
> > > > Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e
> > > > 1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7C
> > > > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> > > > VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv%2BkuwnfR7ETa
> > > > eiszzF7%2B%2FE%3D&reserved=0
> > > >
> > > > Oh yes absolutely. Great timing. And our constraints ***JUST***
> > > > caught up automatically with aiobotocore 2.7.0 - released just 2 days
> > ago.
> > > >
> > > > We've been waiting for it for a long time and I believe the MWAA
> > > > team had some impact there (we've beenit  discussing it a lot).
> > > >
> > > > And yes that will Hopefully change my +1 on AIP-58 to +1!  But only
> > > > when s3fs relax THEIR requirement of aiobotocore ~2.5.4 they
> currently
> > have.
> > > > Currently just using s3fs will bring our botocore and aiobotocore in
> > > > constraints 2.5 months back.
> > > >
> > > > < boto3==1.28.64
> > > > < botocore==1.31.64 -> released 16 Oct 2023
> > > > ---
> > > > > boto3==1.28.17
> > > > > botocore==1.31.17 -> released 1 Aug 2023
> > > >
> > > > And it seems like everyone was waiting for it :
> > > > https://gi/
> > > > thub.com%2Ffsspec%2Fs3fs%2Fpull%2F809-&data=05%7C01%7CJens.Scheffler
> > > > %40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e1907c84e4
> > > > bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7CTWFpbGZsb
> > > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> > > > D%7C3000%7C%7C%7C&sdata=n3hp%2BdlxFW7aKyqxcbE0vboPi61BwSvl1zi5Vd9c6a
> > > > 0%3D&reserved=0 the s3fs change for it was
> > > merged
> > > > yesterday.
> > > >
> > > > So yes +1! I hope the s3fs release will happen before we merge
> AIP-58.
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > On Thu, Oct 19, 2023 at 8:44 PM Bolke de Bruin <bdbr...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for thorough consideration Jarek. I follow your concerns.
> > > > > The
> > > idea
> > > > > behind this AIP
> > > > > was to reduce the cognitive load on users by staying as much
> > > > > pythonic
> > > as
> > > > we
> > > > > can and to be gentle
> > > > > with the Airflow-isms. So I hope to limit that "yet another
> > > > abstraction". I
> > > > > do agree that having great
> > > > > examples and documentation are going to be important. As a random
> > > > > idea, this
> > > https://medi/
> > > um.com%2F%40fninsiima%2Fde-mini-series-part-two-57770ff7cdf9&data=05%7
> > > C01%7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419
> > > %7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnk
> > > nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> > > iLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vmZlhtkBAZs7z03of%2FQ%2FMz8te8By
> > > 2e0QTtdHDNwYPUU%3D&reserved=0
> > > > ,
> > > > > can now be significantly
> > > > > simplified.
> > > > >
> > > > > One less worry I hope is that aiobotocore is actually starting to
> > > > > relax
> > > > its
> > > > > botocore requirements
> > > > > bringing it much closer to latest release:
> > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > > github.com%2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJ
> > > > > ens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C
> > > > > 0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUn
> > > > > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> > > > > 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv%
> > > > > 2BkuwnfR7ETaeiszzF7%2B%2FE%3D&reserved=0
> > > > >
> > > > > On the requirements side there are actually not that many
> > > > > additional dependencies being brought in.
> > > > > Core fsspec does not bring any requirements. s3fs brings in three
> > > > > which
> > > > are
> > > > > all covered by current ones.
> > > > > adlfs brings in five, all already part of our current set. Of
> > > > > course it does bring some complexity, but I do hope you see that
> > > > > it is fairly limited and if it does bring in anything
> > > > it
> > > > > is well supported.
> > > > >
> > > > > The reason for creating common.io as a provider was that it was
> > > > suggested
> > > > > that we might want to
> > > > > move a bit faster than core on the very simple (yet powerful ;-) )
> > > > > FileTransferOperator.
> > > > >
> > > > > Considering this I hope you would like to make your measly +1 into
> > > > > a
> > > > strong
> > > > > +1 :-).
> > > > >
> > > > > Cheers
> > > > > Bolke
> > > > >
> > > > >
> > > > > On Thu, 19 Oct 2023 at 19:48, Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > > >
> > > > > > Finally caught up with this one, looked through code and
> > > discussions. I
> > > > > am
> > > > > > a little torn on that one but I did some more research and I
> > > > > > think
> > > > it's a
> > > > > > useful abstraction.
> > > > > >
> > > > > > +1(binding)
> > > > > >
> > > > > > The big + of using fsspec is that it is already supported by the
> > > > > > most important "consumers" that are likely to be used in
> > > > > > Airflow. Pandas, Pyarrow, Iceberg. The fact that you will be
> > > > > > able to take an S3/GCS ObjectStoragePath as an input directly
> > > > > > and it will transparently use
> > > > the
> > > > > > connection of Airflow is a big plus.
> > > > > >
> > > > > > I would just add that we should get real-life DAG examples on
> > > > > > how
> > > this
> > > > > > might simplify code of their DAGs, it's cool. I think the
> > > > > > quality and clarity of the documentation that will come with it
> > > > > > - clearly
> > > > explaining
> > > > > > some cases and examples on how DAG authors can make use of it to
> > > > > > make
> > > > > their
> > > > > > DAG authoring "better" - is a key to success of this one. If we
> > > > > > fail
> > > to
> > > > > > explain it, it might become yet another rarely used feature of
> > > Airflow
> > > > > >
> > > > > > There is one worry I have - it adds "yet another abstraction" to
> > > learn
> > > > > and
> > > > > > "yet another set of dependencies" to Airflow.  We have a new "
> > > > common.io"
> > > > > > provider, we have many new dependencies, we have aiobotocore as
> > > > > > a requirement for AWS integration for example. I already looked
> > > > > > at the
> > > PR
> > > > > and
> > > > > > attempted to help with some of the dependency questions and
> > problems.
> > > > but
> > > > > > we will have a few more of those to solve and some decisions to
> > > > > > mke
> > > > > should
> > > > > > apache-airflow-provider-common-io be default? Should it be
> > > > > > included
> > > in
> > > > > the
> > > > > > reference image? etc. etc. This will make Airflow and its
> > > dependencies
> > > > > more
> > > > > > complex than simpler. That's why I am not strong +1! just measly
> > > > > > +1 - because I see how it can make airflow even "heavier" than it
> > is now.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 19, 2023 at 4:34 PM Igor Kholopov
> > > > > <ikholo...@google.com.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for incorporating the feedback!
> > > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > On Thu, Oct 19, 2023 at 1:55 PM Dennis Akpenyi <
> > > > > dennisakpe...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > On Thu, Oct 19, 2023 at 12:24 PM Bolke de Bruin <
> > > bdbr...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Dear Community,
> > > > > > > > >
> > > > > > > > > I would like to start a vote for "AIP-58 Add Airflow
> > > > ObjectStore".
> > > > > > > > >
> > > > > > > > > You can find the AIP here:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://cwik/
> > > i.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D2634305
> > > 65&data=05%7C01%7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf892
> > > 08dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372
> > > 153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ
> > > BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yhPY8Cyti%2FHq%2BIGb
> > > QNQHFhl1s5rvTGiMwdI1gxl5Lu8%3D&reserved=0
> > > > > > > > >
> > > > > > > > > Implementing PR (most of the discussion happened here):
> > > > > > > > >
> https://eur03.safelinks.protection.outlook.com/?url=https%25
> > > > > > > > > 3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F34729&data=
> > > > > > > > > 05%7C01%7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482c
> > > > > > > > > f89208dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C
> > > > > > > > > 0%7C638333532372153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
> > > > > > > > > wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > > > > > > > > 3000%7C%7C%7C&sdata=RxUAV0yWdC5o0knhZcFujBQc45%2FZkPdyjYzG
> > > > > > > > > F5Z390A%3D&reserved=0
> > > > > > > > >
> > > > > > > > > Discussion Thread (not much has happened here :-) ):
> > > > > > > > > Note: the title has changed from its original.
> > > > > > > > >
> > > > > > > > >
> > > https://list/
> > > s.apache.org%2Fthread%2Fl3fkr0h6j2g4tlmsov14fywmj58t3mtp&data=05%7C01%
> > > 7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C0
> > > ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown
> > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
> > > XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DK74m2t0JN8ge0YVELdQh6hXu7kHeQUujGYF
> > > VCZ1LKc%3D&reserved=0
> > > > > > > > >
> > > > > > > > > This is my binding +1m the vote will last until 12:00 UTC
> > > > > > > > > on
> > > 26th
> > > > > > > > October,
> > > > > > > > > and until at least 3 binding votes have been cast.
> > > > > > > > >
> > > > > > > > > Please vote accordingly:
> > > > > > > > >
> > > > > > > > > [ ] + 1 approve
> > > > > > > > > [ ] + 0 no opinion
> > > > > > > > > [ ] - 1 disapprove with the reason
> > > > > > > > >
> > > > > > > > > Only votes from PMC members and committers are binding,
> > > > > > > > > but
> > > other
> > > > > > > members
> > > > > > > > > of the community are encouraged to check the AIP and vote
> > > > > > > > > with "(non-binding)".
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > > Bolke
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Bolke de Bruin
> > > > > > > > > bdbr...@gmail.com
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > --
> > > > > Bolke de Bruin
> > > > > bdbr...@gmail.com
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > --
> > > Bolke de Bruin
> > > bdbr...@gmail.com
> > >
> >
>
>
> --
>
> --
> Bolke de Bruin
> bdbr...@gmail.com
>

Reply via email to