And yet another update: - after seeing how it works I will remove requirement generation from pre-commit - now that it needs to be generated separately for different versions of python it's a bit too much overhead (you'd need to have more images downloaded for different python versions for pre-commit). Instaad I will add breeze commands to re-ggenerate the requirements (and bash scripts if you do not use breeze), and anyone changing setup.py will have to do it (otherwise CI builds will fail). I think this workflow will be great to keep our requirements up-to-date and have a stable installation method.
J. On Mon, Mar 23, 2020 at 5:54 PM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > Update - It seems that we won't need the -pinned version eventually. I > realized that we need to have slightly different requirements for different > python versions. > > I just added PR for that: https://github.com/apache/airflow/pull/7841 > > I also found out (during production image exercise) that we can install > airflow predictably in a very simple way (once we release the requirements in > 1.10.10): > > pip install apache-airflow[gcp]==1.10.10 --constraint > https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt > > I think this is simple enough to be used as installation method. I added it > to the documentation and I think I am ok with dropping -pinned package > altogether. > > J. > > > On Sun, Mar 22, 2020 at 10:15 AM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: >> >> Yesterday we had another master breakage - this time from elasticsearch >> releasing MINOR version 7.6 breaking our builds (not it was MINOR version so >> should be compatible .... it was not for us). I fixed it quickly yesterday >> by limiting it to < 7.6 but for me - this is quite clear that trying to rely >> on SemVer being followed by others is a futile effort (at least in python's >> world). >> >> The theory is nice, but it breaks in practice. And it's not really a fault >> of the library maintainers. It's simply sometimes not so easy to see how >> your APIs are used - and in Python, you cannot prevent using stuff that you >> think is an internal detail. This is what happened in elasticsearch case >> yesterday - apparently, our plugin was using an "internal" API unknowingly >> and some parameters from that API were dropped during refactoring of >> elasticsearch library. >> >> My observation (it's anecdotal though) is that the COVID-19 situation made >> people have more time, fewer distractions, fewer things to do, and we have >> higher frequency of OSS packages being released recently so we should >> protect a bit from more often breakages. >> >> I think learning from yesterday is: >> >> * we should merge the requirements.txt solution quickly to prevent further >> breakages (I am reading and testing it now) - I think everyone agrees it's >> good to have it >> * I think we can continue discussing whether apache-airflow-pinned package >> should be released or not. I can leave the code building the package but we >> can decide about it after some more discussion >> >> Does it sound good? >> >> J >> >> >> >> On Fri, Mar 20, 2020 at 2:47 PM Jarek Potiuk <jarek.pot...@polidea.com> >> wrote: >>> >>> And rebased it right now and fixed automated requirements update. >>> >>> On Fri, Mar 20, 2020 at 2:28 PM Jarek Potiuk <jarek.pot...@polidea.com> >>> wrote: >>>> >>>> Ah BTW. I just noticed that for some reason I pasted an old PR earlier in >>>> the thread :(. >>>> This is the one with requirements.txt I am talking about: >>>> https://github.com/apache/airflow/pull/7730 >>>> >>>> On Fri, Mar 20, 2020 at 2:26 PM Jarek Potiuk <jarek.pot...@polidea.com> >>>> wrote: >>>>> >>>>> Nope. Not blocking. I can work with my branch just requirements.txt is >>>>> enough for that :) >>>>> >>>>> I think the problem with semver is that it is loosely followed - we had a >>>>> number of breakages in the past with minor version upgrades :(. >>>>> >>>>> J. >>>>> >>>>> >>>>> >>>>> On Fri, Mar 20, 2020 at 1:27 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >>>>>> >>>>>> Thanks for the detailed explanation Jarek. >>>>>> >>>>>> How about we have an upper limit for all our dependencies, example >>>>>> instead >>>>>> of "google-cloud-storage>=1.16", we have >>>>>> "google-cloud-storage>=1.16,<2.0" ? >>>>>> >>>>>> If a dependency breaks compatibility in minor versions, we can't do >>>>>> anything about it but if they follow SemVer, we should be safe and the >>>>>> first-time installers would have a non-breaking package. WDYT? >>>>>> >>>>>> Btw I hope this is not blocking you in building a production image as I >>>>>> think requirements.txt is solving that? Please let me know if it is >>>>>> blocking. >>>>>> >>>>>> PS: I am also just dumping my ideas to solve this issue. Love to hear >>>>>> what >>>>>> others think too. >>>>>> >>>>>> Regards, >>>>>> Kaxil >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Mar 19, 2020 at 2:43 PM Jarek Potiuk <jarek.pot...@polidea.com> >>>>>> wrote: >>>>>> >>>>>> > I think we have similar understanding. But let me just clarify because >>>>>> > I >>>>>> > think we think about we think about solving two different problems >>>>>> > My proposal is not solving all problems with dependencies - quite the >>>>>> > contrary, I want to solve just one specific "repeatability" problem - >>>>>> > read >>>>>> > on :).. >>>>>> > >>>>>> > 1. A potential source of confusion: using "-pinned" for >>>>>> > installation but >>>>>> > > using "non-pinned" for DAG development. >>>>>> > > >>>>>> > >>>>>> > This could be confusing indeed - but they are the same in fact - >>>>>> > just deps might be different over time. >>>>>> > >>>>>> > 2. Most of the users would still try to install "apache-airflow" >>>>>> > package >>>>>> > > that might have been broken for example because of a dependency >>>>>> > release, >>>>>> > > either way, we would still have to suggest them to use "pinned" >>>>>> > version >>>>>> > > >>>>>> > >>>>>> > True. I thought we might describe it in the README and make it >>>>>> > prominently >>>>>> > explained. Usually people look at the readme in PyPI when they are >>>>>> > installing >>>>>> > stuff and it does not work: https://pypi.org/project/apache-airflow/. >>>>>> > >>>>>> > Also - we could of course explain how to use requirements.txt from the >>>>>> > released >>>>>> > version when they are installing it. That would be an extra friction >>>>>> > point >>>>>> > though >>>>>> > and maybe having "always installable" version of airflow is a better >>>>>> > choice. >>>>>> > >>>>>> > 3. If they install "pinned" version, it is no longer a library >>>>>> > again, >>>>>> > > that is users won't be able to use new NumPy release or >>>>>> > > matplotlib for >>>>>> > > example. In which case we are just circling back to the same >>>>>> > > problem, >>>>>> > > "either we risk broken package" while releasing or we risk >>>>>> > > potentially >>>>>> > > incompatible versions. >>>>>> > > >>>>>> > >>>>>> > Yep. But maybe it's just a question of naming. Maybe even we could name >>>>>> > this package differently to indicate that this version is a way to >>>>>> > quickly >>>>>> > install >>>>>> > airflow but not to do any serious development with it. >>>>>> > >>>>>> > So speaking about THE problem I want to solve with the >>>>>> > requirements.txt and apache-airflow-pinned package: >>>>>> > >>>>>> > I really only want to solve "first-time-user" experience here - nothing >>>>>> > more. I >>>>>> > definitely do not want to replace the current installation method for >>>>>> > experienced >>>>>> > users - for them using --constraint requirements.txt is exactly what >>>>>> > they >>>>>> > need. >>>>>> > The only problem I am trying to solve with that is "repeatability" of >>>>>> > installation. >>>>>> > >>>>>> > Maybe "apache-airflow-quickinstall" or something like that would be >>>>>> > better >>>>>> > than "apache-airflow-pinned" or "apache-airflow-repeatable-install" or >>>>>> > something like that. I think about it as a "flavour" of ariflow rather >>>>>> > than >>>>>> > anything else. I even originally implemented it as [pinned] extra >>>>>> > where I >>>>>> > pinned all requirements. Unfortunately I found that if you have >>>>>> > main requirement without limits, adding the same requirement as extra >>>>>> > with >>>>>> > == does not make it pinned :(. That was my original plan. >>>>>> > >>>>>> > >>>>>> > > Btw I have been on "we should have pinned dependency" camp as Airflow >>>>>> > > should definitely install without breaking since day-1 but I think a >>>>>> > > separate "-pinned" package won't solve that issue. >>>>>> > > >>>>>> > >>>>>> > Ah yeah we went the same route. I do not think we can solve the >>>>>> > "library vs. app" problem easily. This is a bit of "eat-and-have-cake" >>>>>> > at the same time. I know people have problems >>>>>> > with conflicting dependencies when they are trying to install libraries >>>>>> > with different requirements. And I am not even trying to solve that >>>>>> > problem now. Not even close. This requires some other solution >>>>>> > (for example separate virtualenvs with different dependencies >>>>>> > build from wheels on per-task basis). But that's something much further >>>>>> > in the future (if at all). >>>>>> > >>>>>> > >>>>>> > > >>>>>> > > WDYT? Also please do let me know if I have misunderstood something >>>>>> > > (definitely possible :D). >>>>>> > > >>>>>> > > Regards, >>>>>> > > Kaxil >>>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Jarek Potiuk >>>>> Polidea | Principal Software Engineer >>>>> >>>>> M: +48 660 796 129 >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Jarek Potiuk >>>> Polidea | Principal Software Engineer >>>> >>>> M: +48 660 796 129 >>>> >>>> >>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea | Principal Software Engineer >>> >>> M: +48 660 796 129 >>> >>> >> >> >> -- >> >> Jarek Potiuk >> Polidea | Principal Software Engineer >> >> M: +48 660 796 129 >> >> > > > -- > > Jarek Potiuk > Polidea | Principal Software Engineer > > M: +48 660 796 129 > > -- Jarek Potiuk Polidea | Principal Software Engineer M: +48 660 796 129