I have added an example for the use of the FileTransferOperator in the PR. This is a 'port' of the local_to_s3 dag that is used elsewhere in the examples. I kept the structure as per that original, but it could be reduced to a two-liner (in dag-speak).
I agree with Jens that the PR needs to settle a bit; more on the implementation rather than the API imho - the API mostly comes from pathlib.Path + fsspec extensions (but please shoot at it!). I hope we can consider it 'settled enough' by the time this vote ends. Cheers Bolke On Fri, 20 Oct 2023 at 09:43, Scheffler Jens (XC-DX/ETV5) <jens.scheff...@de.bosch.com.invalid> wrote: > +1 (non binding) for making this AIP in general. > > I had a couple of comments and the rework and comments are very active. I > assume the PR needs to settle for a moment and there still a lot of > different opinions - which is fair with the given complexity. The value is > very high but I fear a bit that we nail down the API a bit too fast. But it > is a feature that will need to stay and we need to make it "right". So I > propose either the PR stays for a moment to mature or we need to mark the > feature at least for one version to be "experimental" --> to have the > ability to adjust API if we learn in real life - not being "locked" into > API v1 for years. > > I also would like to see examples, but maybe I need to catch-up with all > the ongoing changes as well. > > THANKS for the efforts and the concepts Bolke! > > Mit freundlichen Grüßen / Best regards > > Jens Scheffler > > Deterministik open Loop (XC-DX/ETV5) > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | > GERMANY | http://www.bosch.com/ > Tel. +49 711 811-91508 | Mobil +49 160 90417410 | > jens.scheff...@de.bosch.com > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; > Geschäftsführung: Dr. Stefan Hartung, > Dr. Christian Fischer, Dr. Markus Forschner, Stefan Grosch, Dr. Markus > Heyn, Dr. Tanja Rückert > > -----Original Message----- > From: Kaxil Naik <kaxiln...@gmail.com> > Sent: Freitag, 20. Oktober 2023 01:00 > To: dev@airflow.apache.org > Subject: Re: [VOTE] AIP-58 Airflow ObjectStore > > I like where this is heading, so I vote *+1*. > > Although, I would like to see some examples of usage in DAGs (before/after > would be great) that will help support the following points that you have > mentioned in the AIP < > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whyisitneeded > ?> > : > > 1. Simplify DAG CI/CD > 2. Streamlining pre-DAG to DAG (e.g. notebooks to DAG) > 3. To allow DAG processing to be using arbitrary locations (object > storage) > 4. To have a unified interface to file operations in TaskFlow and > traditional Operators > > and some comments: > > 1. You do have *lineage* listed in the image > < > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whatchangedoyouproposetomake > ?>, > but is it a follow-up work that you were thinking or was it part of AIP > completion? > 2. We would contribute the File abstraction as a follow-up to this AIP > too, which will help with the Dataset story too > > > Regards, > Kaxil > > On Thu, 19 Oct 2023 at 20:21, Bolke de Bruin <bdbr...@gmail.com> wrote: > > > I dont mind waiting for that given a reasonable timeframe. Martin > > mentioned he wanted to do something at the end of the week. The vote > > to this AIP runs until next Thursday anyway :-). > > > > And thank you :-). > > > > B. > > > > On Thu, 19 Oct 2023 at 21:11, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > One less worry I hope is that aiobotocore is actually starting to > > > > relax > > > its botocore requirements bringing it much closer to latest release: > > > https://gi/ > > > thub.com%2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJens. > > > Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e > > > 1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7C > > > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX > > > VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv%2BkuwnfR7ETa > > > eiszzF7%2B%2FE%3D&reserved=0 > > > > > > Oh yes absolutely. Great timing. And our constraints ***JUST*** > > > caught up automatically with aiobotocore 2.7.0 - released just 2 days > ago. > > > > > > We've been waiting for it for a long time and I believe the MWAA > > > team had some impact there (we've beenit discussing it a lot). > > > > > > And yes that will Hopefully change my +1 on AIP-58 to +1! But only > > > when s3fs relax THEIR requirement of aiobotocore ~2.5.4 they currently > have. > > > Currently just using s3fs will bring our botocore and aiobotocore in > > > constraints 2.5 months back. > > > > > > < boto3==1.28.64 > > > < botocore==1.31.64 -> released 16 Oct 2023 > > > --- > > > > boto3==1.28.17 > > > > botocore==1.31.17 -> released 1 Aug 2023 > > > > > > And it seems like everyone was waiting for it : > > > https://gi/ > > > thub.com%2Ffsspec%2Fs3fs%2Fpull%2F809-&data=05%7C01%7CJens.Scheffler > > > %40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e1907c84e4 > > > bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7CTWFpbGZsb > > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 > > > D%7C3000%7C%7C%7C&sdata=n3hp%2BdlxFW7aKyqxcbE0vboPi61BwSvl1zi5Vd9c6a > > > 0%3D&reserved=0 the s3fs change for it was > > merged > > > yesterday. > > > > > > So yes +1! I hope the s3fs release will happen before we merge AIP-58. > > > > > > J. > > > > > > > > > > > > On Thu, Oct 19, 2023 at 8:44 PM Bolke de Bruin <bdbr...@gmail.com> > > wrote: > > > > > > > Thanks for thorough consideration Jarek. I follow your concerns. > > > > The > > idea > > > > behind this AIP > > > > was to reduce the cognitive load on users by staying as much > > > > pythonic > > as > > > we > > > > can and to be gentle > > > > with the Airflow-isms. So I hope to limit that "yet another > > > abstraction". I > > > > do agree that having great > > > > examples and documentation are going to be important. As a random > > > > idea, this > > https://medi/ > > um.com%2F%40fninsiima%2Fde-mini-series-part-two-57770ff7cdf9&data=05%7 > > C01%7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419 > > %7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnk > > nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw > > iLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vmZlhtkBAZs7z03of%2FQ%2FMz8te8By > > 2e0QTtdHDNwYPUU%3D&reserved=0 > > > , > > > > can now be significantly > > > > simplified. > > > > > > > > One less worry I hope is that aiobotocore is actually starting to > > > > relax > > > its > > > > botocore requirements > > > > bringing it much closer to latest release: > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > > github.com%2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJ > > > > ens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C > > > > 0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUn > > > > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik > > > > 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv% > > > > 2BkuwnfR7ETaeiszzF7%2B%2FE%3D&reserved=0 > > > > > > > > On the requirements side there are actually not that many > > > > additional dependencies being brought in. > > > > Core fsspec does not bring any requirements. s3fs brings in three > > > > which > > > are > > > > all covered by current ones. > > > > adlfs brings in five, all already part of our current set. Of > > > > course it does bring some complexity, but I do hope you see that > > > > it is fairly limited and if it does bring in anything > > > it > > > > is well supported. > > > > > > > > The reason for creating common.io as a provider was that it was > > > suggested > > > > that we might want to > > > > move a bit faster than core on the very simple (yet powerful ;-) ) > > > > FileTransferOperator. > > > > > > > > Considering this I hope you would like to make your measly +1 into > > > > a > > > strong > > > > +1 :-). > > > > > > > > Cheers > > > > Bolke > > > > > > > > > > > > On Thu, 19 Oct 2023 at 19:48, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > > > Finally caught up with this one, looked through code and > > discussions. I > > > > am > > > > > a little torn on that one but I did some more research and I > > > > > think > > > it's a > > > > > useful abstraction. > > > > > > > > > > +1(binding) > > > > > > > > > > The big + of using fsspec is that it is already supported by the > > > > > most important "consumers" that are likely to be used in > > > > > Airflow. Pandas, Pyarrow, Iceberg. The fact that you will be > > > > > able to take an S3/GCS ObjectStoragePath as an input directly > > > > > and it will transparently use > > > the > > > > > connection of Airflow is a big plus. > > > > > > > > > > I would just add that we should get real-life DAG examples on > > > > > how > > this > > > > > might simplify code of their DAGs, it's cool. I think the > > > > > quality and clarity of the documentation that will come with it > > > > > - clearly > > > explaining > > > > > some cases and examples on how DAG authors can make use of it to > > > > > make > > > > their > > > > > DAG authoring "better" - is a key to success of this one. If we > > > > > fail > > to > > > > > explain it, it might become yet another rarely used feature of > > Airflow > > > > > > > > > > There is one worry I have - it adds "yet another abstraction" to > > learn > > > > and > > > > > "yet another set of dependencies" to Airflow. We have a new " > > > common.io" > > > > > provider, we have many new dependencies, we have aiobotocore as > > > > > a requirement for AWS integration for example. I already looked > > > > > at the > > PR > > > > and > > > > > attempted to help with some of the dependency questions and > problems. > > > but > > > > > we will have a few more of those to solve and some decisions to > > > > > mke > > > > should > > > > > apache-airflow-provider-common-io be default? Should it be > > > > > included > > in > > > > the > > > > > reference image? etc. etc. This will make Airflow and its > > dependencies > > > > more > > > > > complex than simpler. That's why I am not strong +1! just measly > > > > > +1 - because I see how it can make airflow even "heavier" than it > is now. > > > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 4:34 PM Igor Kholopov > > > > <ikholo...@google.com.invalid > > > > > > > > > > > wrote: > > > > > > > > > > > Thanks for incorporating the feedback! > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > On Thu, Oct 19, 2023 at 1:55 PM Dennis Akpenyi < > > > > dennisakpe...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 12:24 PM Bolke de Bruin < > > bdbr...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Dear Community, > > > > > > > > > > > > > > > > I would like to start a vote for "AIP-58 Add Airflow > > > ObjectStore". > > > > > > > > > > > > > > > > You can find the AIP here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwik/ > > i.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D2634305 > > 65&data=05%7C01%7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf892 > > 08dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372 > > 153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ > > BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yhPY8Cyti%2FHq%2BIGb > > QNQHFhl1s5rvTGiMwdI1gxl5Lu8%3D&reserved=0 > > > > > > > > > > > > > > > > Implementing PR (most of the discussion happened here): > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%25 > > > > > > > > 3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F34729&data= > > > > > > > > 05%7C01%7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482c > > > > > > > > f89208dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C > > > > > > > > 0%7C638333532372153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4 > > > > > > > > wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > > > > > > > > 3000%7C%7C%7C&sdata=RxUAV0yWdC5o0knhZcFujBQc45%2FZkPdyjYzG > > > > > > > > F5Z390A%3D&reserved=0 > > > > > > > > > > > > > > > > Discussion Thread (not much has happened here :-) ): > > > > > > > > Note: the title has changed from its original. > > > > > > > > > > > > > > > > > > https://list/ > > s.apache.org%2Fthread%2Fl3fkr0h6j2g4tlmsov14fywmj58t3mtp&data=05%7C01% > > 7CJens.Scheffler%40de.bosch.com%7C83c763cbafcc482cf89208dbd0f73419%7C0 > > ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ > > XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DK74m2t0JN8ge0YVELdQh6hXu7kHeQUujGYF > > VCZ1LKc%3D&reserved=0 > > > > > > > > > > > > > > > > This is my binding +1m the vote will last until 12:00 UTC > > > > > > > > on > > 26th > > > > > > > October, > > > > > > > > and until at least 3 binding votes have been cast. > > > > > > > > > > > > > > > > Please vote accordingly: > > > > > > > > > > > > > > > > [ ] + 1 approve > > > > > > > > [ ] + 0 no opinion > > > > > > > > [ ] - 1 disapprove with the reason > > > > > > > > > > > > > > > > Only votes from PMC members and committers are binding, > > > > > > > > but > > other > > > > > > members > > > > > > > > of the community are encouraged to check the AIP and vote > > > > > > > > with "(non-binding)". > > > > > > > > > > > > > > > > Cheers > > > > > > > > Bolke > > > > > > > > -- > > > > > > > > > > > > > > > > -- > > > > > > > > Bolke de Bruin > > > > > > > > bdbr...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > -- > > > > Bolke de Bruin > > > > bdbr...@gmail.com > > > > > > > > > > > > > -- > > > > -- > > Bolke de Bruin > > bdbr...@gmail.com > > > -- -- Bolke de Bruin bdbr...@gmail.com