The foundational implementation is in core as is suggested in the AIP. The Operator(s) are in common.io to allow for faster iterations.
The challenge with fsspec is not in fsspec itself which brings very little additional dependencies on its own. It is in s3fs which relies on aiobotocore which relies on a specific set of versions botocore. By pinning fsspec / s3fs we are effectively pinning botocore to that set of versions which might not be what we want. Maybe the MWAA team can nudge the boto team to finally provide async as a first class citizen (hint hint hint) :-). Bolke On Tue, 24 Oct 2023 at 08:07, Amogh Desai <[email protected]> wrote: > Thanks for bringing this up, Bolke. > > I generally like the idea of having AS and I like where the discussions > here are going. > > Just one qn I have regarding where this will fit into the wider ecosystem > is that, should we integrate this into core rather than a provider? > Meaning, it makes more sense to have this be in the core > given that this is a pretty common problem across stakeholders. > > I also agree with Hussein's concern above. Maybe that can be tackled by > pinning a stable version? (Since we don't really NEED new features from > the fsspec package but rather need a stable one) > > Thanks & Regards, > Amogh Desai > > On Sun, Oct 22, 2023 at 5:10 PM Hussein Awala <[email protected]> wrote: > > > This AIP will have a great positive impact on the project: > > - Airflow will be increasingly used as a scheduler for ML projects. > > - Simplifying the file transfer operators by replacing them with a single > > one for all the file/object storage services. > > - Implementing and managing the CI/CD pipelines for Airflow DAGs will be > > much easier once we use the new feature in the DAGs processor (out of the > > scope of this AIP but will be possible). > > - Opportunity to support a generic XCom backend based on AFS, helpful in > > sharing big files between the tasks and creating dynamic tasks from files > > (out of the scope of this AIP but will be possible). > > > > +1 (binding) > > > > My only concern is the stability of fsspec packages; I had a bad > experience > > with s3fs and gcsfs in the past due to patch/minor releases with breaking > > changes or conflict with botocore for s3fs, hope release management has > > improved since. > > > > On Fri, Oct 20, 2023 at 4:33 PM Bolke de Bruin <[email protected]> > wrote: > > > > > I have added an example for the use of the FileTransferOperator in the > > PR. > > > This is a 'port' of the local_to_s3 dag that is used elsewhere in the > > > examples. I kept the structure as per that original, but it could be > > > reduced to a two-liner (in dag-speak). > > > > > > I agree with Jens that the PR needs to settle a bit; more on the > > > implementation rather than the API imho - the API mostly comes from > > > pathlib.Path + fsspec extensions (but please shoot at it!). I hope we > can > > > consider it 'settled enough' by the time this vote ends. > > > > > > Cheers > > > Bolke > > > > > > On Fri, 20 Oct 2023 at 09:43, Scheffler Jens (XC-DX/ETV5) > > > <[email protected]> wrote: > > > > > > > +1 (non binding) for making this AIP in general. > > > > > > > > I had a couple of comments and the rework and comments are very > > active. I > > > > assume the PR needs to settle for a moment and there still a lot of > > > > different opinions - which is fair with the given complexity. The > value > > > is > > > > very high but I fear a bit that we nail down the API a bit too fast. > > But > > > it > > > > is a feature that will need to stay and we need to make it "right". > So > > I > > > > propose either the PR stays for a moment to mature or we need to mark > > the > > > > feature at least for one version to be "experimental" --> to have the > > > > ability to adjust API if we learn in real life - not being "locked" > > into > > > > API v1 for years. > > > > > > > > I also would like to see examples, but maybe I need to catch-up with > > all > > > > the ongoing changes as well. > > > > > > > > THANKS for the efforts and the concepts Bolke! > > > > > > > > Mit freundlichen Grüßen / Best regards > > > > > > > > Jens Scheffler > > > > > > > > Deterministik open Loop (XC-DX/ETV5) > > > > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | > > > > GERMANY | http://www.bosch.com/ > > > > Tel. +49 711 811-91508 | Mobil +49 160 90417410 | > > > > [email protected] > > > > > > > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; > > > > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; > > > > Geschäftsführung: Dr. Stefan Hartung, > > > > Dr. Christian Fischer, Dr. Markus Forschner, Stefan Grosch, Dr. > Markus > > > > Heyn, Dr. Tanja Rückert > > > > > > > > -----Original Message----- > > > > From: Kaxil Naik <[email protected]> > > > > Sent: Freitag, 20. Oktober 2023 01:00 > > > > To: [email protected] > > > > Subject: Re: [VOTE] AIP-58 Airflow ObjectStore > > > > > > > > I like where this is heading, so I vote *+1*. > > > > > > > > Although, I would like to see some examples of usage in DAGs > > > (before/after > > > > would be great) that will help support the following points that you > > have > > > > mentioned in the AIP < > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whyisitneeded > > > > ?> > > > > : > > > > > > > > 1. Simplify DAG CI/CD > > > > 2. Streamlining pre-DAG to DAG (e.g. notebooks to DAG) > > > > 3. To allow DAG processing to be using arbitrary locations (object > > > > storage) > > > > 4. To have a unified interface to file operations in TaskFlow and > > > > traditional Operators > > > > > > > > and some comments: > > > > > > > > 1. You do have *lineage* listed in the image > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whatchangedoyouproposetomake > > > > ?>, > > > > but is it a follow-up work that you were thinking or was it part > of > > > AIP > > > > completion? > > > > 2. We would contribute the File abstraction as a follow-up to this > > AIP > > > > too, which will help with the Dataset story too > > > > > > > > > > > > Regards, > > > > Kaxil > > > > > > > > On Thu, 19 Oct 2023 at 20:21, Bolke de Bruin <[email protected]> > > wrote: > > > > > > > > > I dont mind waiting for that given a reasonable timeframe. Martin > > > > > mentioned he wanted to do something at the end of the week. The > vote > > > > > to this AIP runs until next Thursday anyway :-). > > > > > > > > > > And thank you :-). > > > > > > > > > > B. > > > > > > > > > > On Thu, 19 Oct 2023 at 21:11, Jarek Potiuk <[email protected]> > wrote: > > > > > > > > > > > > One less worry I hope is that aiobotocore is actually starting > to > > > > > > > relax > > > > > > its botocore requirements bringing it much closer to latest > > release: > > > > > > https://gi/ > > > > > > thub.com > > %2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJens. > > > > > > Scheffler%40de.bosch.com > > %7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e > > > > > > > > 1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7C > > > > > > > > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX > > > > > > > > VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv%2BkuwnfR7ETa > > > > > > eiszzF7%2B%2FE%3D&reserved=0 > > > > > > > > > > > > Oh yes absolutely. Great timing. And our constraints ***JUST*** > > > > > > caught up automatically with aiobotocore 2.7.0 - released just 2 > > days > > > > ago. > > > > > > > > > > > > We've been waiting for it for a long time and I believe the MWAA > > > > > > team had some impact there (we've beenit discussing it a lot). > > > > > > > > > > > > And yes that will Hopefully change my +1 on AIP-58 to +1! But > only > > > > > > when s3fs relax THEIR requirement of aiobotocore ~2.5.4 they > > > currently > > > > have. > > > > > > Currently just using s3fs will bring our botocore and aiobotocore > > in > > > > > > constraints 2.5 months back. > > > > > > > > > > > > < boto3==1.28.64 > > > > > > < botocore==1.31.64 -> released 16 Oct 2023 > > > > > > --- > > > > > > > boto3==1.28.17 > > > > > > > botocore==1.31.17 -> released 1 Aug 2023 > > > > > > > > > > > > And it seems like everyone was waiting for it : > > > > > > https://gi/ > > > > > > thub.com > > %2Ffsspec%2Fs3fs%2Fpull%2F809-&data=05%7C01%7CJens.Scheffler > > > > > > %40de.bosch.com > > %7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e1907c84e4 > > > > > > > > bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7CTWFpbGZsb > > > > > > > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 > > > > > > > > D%7C3000%7C%7C%7C&sdata=n3hp%2BdlxFW7aKyqxcbE0vboPi61BwSvl1zi5Vd9c6a > > > > > > 0%3D&reserved=0 the s3fs change for it was > > > > > merged > > > > > > yesterday. > > > > > > > > > > > > So yes +1! I hope the s3fs release will happen before we merge > > > AIP-58. > > > > > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 8:44 PM Bolke de Bruin < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > Thanks for thorough consideration Jarek. I follow your > concerns. > > > > > > > The > > > > > idea > > > > > > > behind this AIP > > > > > > > was to reduce the cognitive load on users by staying as much > > > > > > > pythonic > > > > > as > > > > > > we > > > > > > > can and to be gentle > > > > > > > with the Airflow-isms. So I hope to limit that "yet another > > > > > > abstraction". I > > > > > > > do agree that having great > > > > > > > examples and documentation are going to be important. As a > random > > > > > > > idea, this > > > > > https://medi/ > > > > > um.com > > %2F%40fninsiima%2Fde-mini-series-part-two-57770ff7cdf9&data=05%7 > > > > > C01%7CJens.Scheffler%40de.bosch.com > > %7C83c763cbafcc482cf89208dbd0f73419 > > > > > > > %7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnk > > > > > > > nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw > > > > > > > iLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vmZlhtkBAZs7z03of%2FQ%2FMz8te8By > > > > > 2e0QTtdHDNwYPUU%3D&reserved=0 > > > > > > , > > > > > > > can now be significantly > > > > > > > simplified. > > > > > > > > > > > > > > One less worry I hope is that aiobotocore is actually starting > to > > > > > > > relax > > > > > > its > > > > > > > botocore requirements > > > > > > > bringing it much closer to latest release: > > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > > > > > github.com > > %2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJ > > > > > > > ens.Scheffler%40de.bosch.com > > %7C83c763cbafcc482cf89208dbd0f73419%7C > > > > > > > > > 0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUn > > > > > > > > > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik > > > > > > > > > 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv% > > > > > > > 2BkuwnfR7ETaeiszzF7%2B%2FE%3D&reserved=0 > > > > > > > > > > > > > > On the requirements side there are actually not that many > > > > > > > additional dependencies being brought in. > > > > > > > Core fsspec does not bring any requirements. s3fs brings in > three > > > > > > > which > > > > > > are > > > > > > > all covered by current ones. > > > > > > > adlfs brings in five, all already part of our current set. Of > > > > > > > course it does bring some complexity, but I do hope you see > that > > > > > > > it is fairly limited and if it does bring in anything > > > > > > it > > > > > > > is well supported. > > > > > > > > > > > > > > The reason for creating common.io as a provider was that it > was > > > > > > suggested > > > > > > > that we might want to > > > > > > > move a bit faster than core on the very simple (yet powerful > ;-) > > ) > > > > > > > FileTransferOperator. > > > > > > > > > > > > > > Considering this I hope you would like to make your measly +1 > > into > > > > > > > a > > > > > > strong > > > > > > > +1 :-). > > > > > > > > > > > > > > Cheers > > > > > > > Bolke > > > > > > > > > > > > > > > > > > > > > On Thu, 19 Oct 2023 at 19:48, Jarek Potiuk <[email protected]> > > > wrote: > > > > > > > > > > > > > > > Finally caught up with this one, looked through code and > > > > > discussions. I > > > > > > > am > > > > > > > > a little torn on that one but I did some more research and I > > > > > > > > think > > > > > > it's a > > > > > > > > useful abstraction. > > > > > > > > > > > > > > > > +1(binding) > > > > > > > > > > > > > > > > The big + of using fsspec is that it is already supported by > > the > > > > > > > > most important "consumers" that are likely to be used in > > > > > > > > Airflow. Pandas, Pyarrow, Iceberg. The fact that you will be > > > > > > > > able to take an S3/GCS ObjectStoragePath as an input directly > > > > > > > > and it will transparently use > > > > > > the > > > > > > > > connection of Airflow is a big plus. > > > > > > > > > > > > > > > > I would just add that we should get real-life DAG examples on > > > > > > > > how > > > > > this > > > > > > > > might simplify code of their DAGs, it's cool. I think the > > > > > > > > quality and clarity of the documentation that will come with > it > > > > > > > > - clearly > > > > > > explaining > > > > > > > > some cases and examples on how DAG authors can make use of it > > to > > > > > > > > make > > > > > > > their > > > > > > > > DAG authoring "better" - is a key to success of this one. If > we > > > > > > > > fail > > > > > to > > > > > > > > explain it, it might become yet another rarely used feature > of > > > > > Airflow > > > > > > > > > > > > > > > > There is one worry I have - it adds "yet another abstraction" > > to > > > > > learn > > > > > > > and > > > > > > > > "yet another set of dependencies" to Airflow. We have a new > " > > > > > > common.io" > > > > > > > > provider, we have many new dependencies, we have aiobotocore > as > > > > > > > > a requirement for AWS integration for example. I already > looked > > > > > > > > at the > > > > > PR > > > > > > > and > > > > > > > > attempted to help with some of the dependency questions and > > > > problems. > > > > > > but > > > > > > > > we will have a few more of those to solve and some decisions > to > > > > > > > > mke > > > > > > > should > > > > > > > > apache-airflow-provider-common-io be default? Should it be > > > > > > > > included > > > > > in > > > > > > > the > > > > > > > > reference image? etc. etc. This will make Airflow and its > > > > > dependencies > > > > > > > more > > > > > > > > complex than simpler. That's why I am not strong +1! just > > measly > > > > > > > > +1 - because I see how it can make airflow even "heavier" > than > > it > > > > is now. > > > > > > > > > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 4:34 PM Igor Kholopov > > > > > > > <[email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for incorporating the feedback! > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 1:55 PM Dennis Akpenyi < > > > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 12:24 PM Bolke de Bruin < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Dear Community, > > > > > > > > > > > > > > > > > > > > > > I would like to start a vote for "AIP-58 Add Airflow > > > > > > ObjectStore". > > > > > > > > > > > > > > > > > > > > > > You can find the AIP here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwik/ > > > > > i.apache.org > > %2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D2634305 > > > > > 65&data=05%7C01%7CJens.Scheffler%40de.bosch.com > > %7C83c763cbafcc482cf892 > > > > > > > 08dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372 > > > > > > > 153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ > > > > > > > BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yhPY8Cyti%2FHq%2BIGb > > > > > QNQHFhl1s5rvTGiMwdI1gxl5Lu8%3D&reserved=0 > > > > > > > > > > > > > > > > > > > > > > Implementing PR (most of the discussion happened here): > > > > > > > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%25 > > > > > > > > > > > > > 3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F34729&data= > > > > > > > > > > > 05%7C01%7CJens.Scheffler%40de.bosch.com > > %7C83c763cbafcc482c > > > > > > > > > > > > > f89208dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C > > > > > > > > > > > > > 0%7C638333532372153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4 > > > > > > > > > > > > > wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > > > > > > > > > > > > > 3000%7C%7C%7C&sdata=RxUAV0yWdC5o0knhZcFujBQc45%2FZkPdyjYzG > > > > > > > > > > > F5Z390A%3D&reserved=0 > > > > > > > > > > > > > > > > > > > > > > Discussion Thread (not much has happened here :-) ): > > > > > > > > > > > Note: the title has changed from its original. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://list/ > > > > > s.apache.org > > %2Fthread%2Fl3fkr0h6j2g4tlmsov14fywmj58t3mtp&data=05%7C01% > > > > > 7CJens.Scheffler%40de.bosch.com > > %7C83c763cbafcc482cf89208dbd0f73419%7C0 > > > > > > > ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown > > > > > > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ > > > > > > > XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DK74m2t0JN8ge0YVELdQh6hXu7kHeQUujGYF > > > > > VCZ1LKc%3D&reserved=0 > > > > > > > > > > > > > > > > > > > > > > This is my binding +1m the vote will last until 12:00 > UTC > > > > > > > > > > > on > > > > > 26th > > > > > > > > > > October, > > > > > > > > > > > and until at least 3 binding votes have been cast. > > > > > > > > > > > > > > > > > > > > > > Please vote accordingly: > > > > > > > > > > > > > > > > > > > > > > [ ] + 1 approve > > > > > > > > > > > [ ] + 0 no opinion > > > > > > > > > > > [ ] - 1 disapprove with the reason > > > > > > > > > > > > > > > > > > > > > > Only votes from PMC members and committers are binding, > > > > > > > > > > > but > > > > > other > > > > > > > > > members > > > > > > > > > > > of the community are encouraged to check the AIP and > vote > > > > > > > > > > > with "(non-binding)". > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > Bolke > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Bolke de Bruin > > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > -- > > > > > > > Bolke de Bruin > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > -- > > > > > Bolke de Bruin > > > > > [email protected] > > > > > > > > > > > > > > > > > > -- > > > > > > -- > > > Bolke de Bruin > > > [email protected] > > > > > > -- -- Bolke de Bruin [email protected]
