I like where this is heading, so I vote *+1*. Although, I would like to see some examples of usage in DAGs (before/after would be great) that will help support the following points that you have mentioned in the AIP <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whyisitneeded?> :
1. Simplify DAG CI/CD 2. Streamlining pre-DAG to DAG (e.g. notebooks to DAG) 3. To allow DAG processing to be using arbitrary locations (object storage) 4. To have a unified interface to file operations in TaskFlow and traditional Operators and some comments: 1. You do have *lineage* listed in the image <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whatchangedoyouproposetomake?>, but is it a follow-up work that you were thinking or was it part of AIP completion? 2. We would contribute the File abstraction as a follow-up to this AIP too, which will help with the Dataset story too Regards, Kaxil On Thu, 19 Oct 2023 at 20:21, Bolke de Bruin <[email protected]> wrote: > I dont mind waiting for that given a reasonable timeframe. Martin mentioned > he wanted to > do something at the end of the week. The vote to this AIP runs until next > Thursday anyway :-). > > And thank you :-). > > B. > > On Thu, 19 Oct 2023 at 21:11, Jarek Potiuk <[email protected]> wrote: > > > > One less worry I hope is that aiobotocore is actually starting to relax > > its botocore requirements bringing it much closer to latest release: > > https://github.com/aio-libs/aiobotocore/pull/1037 > > > > Oh yes absolutely. Great timing. And our constraints ***JUST*** caught up > > automatically with aiobotocore 2.7.0 - released just 2 days ago. > > > > We've been waiting for it for a long time and I believe the MWAA team had > > some impact there (we've beenit discussing it a lot). > > > > And yes that will Hopefully change my +1 on AIP-58 to +1! But only when > > s3fs relax THEIR requirement of aiobotocore ~2.5.4 they currently have. > > Currently just using s3fs will bring our botocore and aiobotocore in > > constraints 2.5 months back. > > > > < boto3==1.28.64 > > < botocore==1.31.64 -> released 16 Oct 2023 > > --- > > > boto3==1.28.17 > > > botocore==1.31.17 -> released 1 Aug 2023 > > > > And it seems like everyone was waiting for it : > > https://github.com/fsspec/s3fs/pull/809- the s3fs change for it was > merged > > yesterday. > > > > So yes +1! I hope the s3fs release will happen before we merge AIP-58. > > > > J. > > > > > > > > On Thu, Oct 19, 2023 at 8:44 PM Bolke de Bruin <[email protected]> > wrote: > > > > > Thanks for thorough consideration Jarek. I follow your concerns. The > idea > > > behind this AIP > > > was to reduce the cognitive load on users by staying as much pythonic > as > > we > > > can and to be gentle > > > with the Airflow-isms. So I hope to limit that "yet another > > abstraction". I > > > do agree that having great > > > examples and documentation are going to be important. As a random idea, > > > this > https://medium.com/@fninsiima/de-mini-series-part-two-57770ff7cdf9 > > , > > > can now be significantly > > > simplified. > > > > > > One less worry I hope is that aiobotocore is actually starting to relax > > its > > > botocore requirements > > > bringing it much closer to latest release: > > > https://github.com/aio-libs/aiobotocore/pull/1037 > > > > > > On the requirements side there are actually not that many additional > > > dependencies being brought in. > > > Core fsspec does not bring any requirements. s3fs brings in three which > > are > > > all covered by current ones. > > > adlfs brings in five, all already part of our current set. Of course it > > > does bring some complexity, but I do > > > hope you see that it is fairly limited and if it does bring in anything > > it > > > is well supported. > > > > > > The reason for creating common.io as a provider was that it was > > suggested > > > that we might want to > > > move a bit faster than core on the very simple (yet powerful ;-) ) > > > FileTransferOperator. > > > > > > Considering this I hope you would like to make your measly +1 into a > > strong > > > +1 :-). > > > > > > Cheers > > > Bolke > > > > > > > > > On Thu, 19 Oct 2023 at 19:48, Jarek Potiuk <[email protected]> wrote: > > > > > > > Finally caught up with this one, looked through code and > discussions. I > > > am > > > > a little torn on that one but I did some more research and I think > > it's a > > > > useful abstraction. > > > > > > > > +1(binding) > > > > > > > > The big + of using fsspec is that it is already supported by the most > > > > important "consumers" that are likely to be used in Airflow. Pandas, > > > > Pyarrow, Iceberg. The fact that you will be able to take an S3/GCS > > > > ObjectStoragePath as an input directly and it will transparently use > > the > > > > connection of Airflow is a big plus. > > > > > > > > I would just add that we should get real-life DAG examples on how > this > > > > might simplify code of their DAGs, it's cool. I think the quality and > > > > clarity of the documentation that will come with it - clearly > > explaining > > > > some cases and examples on how DAG authors can make use of it to make > > > their > > > > DAG authoring "better" - is a key to success of this one. If we fail > to > > > > explain it, it might become yet another rarely used feature of > Airflow > > > > > > > > There is one worry I have - it adds "yet another abstraction" to > learn > > > and > > > > "yet another set of dependencies" to Airflow. We have a new " > > common.io" > > > > provider, we have many new dependencies, we have aiobotocore as a > > > > requirement for AWS integration for example. I already looked at the > PR > > > and > > > > attempted to help with some of the dependency questions and problems. > > but > > > > we will have a few more of those to solve and some decisions to mke > > > should > > > > apache-airflow-provider-common-io be default? Should it be included > in > > > the > > > > reference image? etc. etc. This will make Airflow and its > dependencies > > > more > > > > complex than simpler. That's why I am not strong +1! just measly +1 - > > > > because I see how it can make airflow even "heavier" than it is now. > > > > > > > > J. > > > > > > > > > > > > > > > > On Thu, Oct 19, 2023 at 4:34 PM Igor Kholopov > > > <[email protected] > > > > > > > > > wrote: > > > > > > > > > Thanks for incorporating the feedback! > > > > > > > > > > +1 (non-binding) > > > > > > > > > > On Thu, Oct 19, 2023 at 1:55 PM Dennis Akpenyi < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > On Thu, Oct 19, 2023 at 12:24 PM Bolke de Bruin < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Dear Community, > > > > > > > > > > > > > > I would like to start a vote for "AIP-58 Add Airflow > > ObjectStore". > > > > > > > > > > > > > > You can find the AIP here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565 > > > > > > > > > > > > > > Implementing PR (most of the discussion happened here): > > > > > > > https://github.com/apache/airflow/pull/34729 > > > > > > > > > > > > > > Discussion Thread (not much has happened here :-) ): > > > > > > > Note: the title has changed from its original. > > > > > > > > > > > > > > > https://lists.apache.org/thread/l3fkr0h6j2g4tlmsov14fywmj58t3mtp > > > > > > > > > > > > > > This is my binding +1m the vote will last until 12:00 UTC on > 26th > > > > > > October, > > > > > > > and until at least 3 binding votes have been cast. > > > > > > > > > > > > > > Please vote accordingly: > > > > > > > > > > > > > > [ ] + 1 approve > > > > > > > [ ] + 0 no opinion > > > > > > > [ ] - 1 disapprove with the reason > > > > > > > > > > > > > > Only votes from PMC members and committers are binding, but > other > > > > > members > > > > > > > of the community are encouraged to check the AIP and vote with > > > > > > > "(non-binding)". > > > > > > > > > > > > > > Cheers > > > > > > > Bolke > > > > > > > -- > > > > > > > > > > > > > > -- > > > > > > > Bolke de Bruin > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -- > > > Bolke de Bruin > > > [email protected] > > > > > > > > -- > > -- > Bolke de Bruin > [email protected] >
