It's a nice one :). However I think when/if we go to pinned dependencies the way poetry/pip-tools do it, this will be suddenly lot-less useful It will be very easy to track dependency changes (they will be always committed as a change in the .lock file or requirements.txt) and if someone has a problem while upgrading a dependency (always consciously, never accidentally) it will simply fail during CI build and the change won't get merged/won't break the builds of others in the first place :).
J. On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd.den...@gmail.com> wrote: > Hi folks, > > On top of this discussion, I was thinking we should have the ability to > quickly monitor dependency release as well. Previously, it happened for a > few times that CI kept failing for no reason and eventually turned out it > was due to dependency release. But it took us some time, sometimes a few > days, to realise the failure was because of dependency release. > > To partially address this, I tried to develop a mini tool to help us check > the latest release of Python packages & the release date-time on PyPi. So, > by comparing it with our CI failure history, we may be able to troubleshoot > faster. > > Output Sample (ordered by upload time in desc order): > Latest Version Upload Time > Package Name > awscli 1.16.28 2018-10-05T23:12:45 > botocore 1.12.18 2018-10-05T23:12:39 > promise 2.2.1 2018-10-04T22:04:18 > Keras 2.2.4 2018-10-03T20:59:39 > bleach 3.0.0 2018-10-03T16:54:27 > Flask-AppBuilder 1.12.0 2018-10-03T09:03:48 > ... ... > > It's a minimal tool (not perfect yet but working). I have hosted this tool > at https://github.com/XD-DENG/pypi-release-query. > > > XD > > On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > Hello Erik, > > > > I understand your concern. It's a hard one to solve in general (i.e. > > dependency-hell). It looks like in this case you treat Airflow as > > 'library', where for some other people it might be more like 'end > product'. > > If you look at the "pinning" philosophy - the "pin everything" is good > for > > end products, but not good for libraries. In the case you have Airflow is > > treated as a bit of both. And it's perfectly valid case at that (with > > custom python DAGs being central concept for Airflow). > > However, I think it's not as bad as you think when it comes to exact > > pinning. > > > > I believe - a bit counter-intuitively - that tools like pip-tools/poetry > > with exact pinning result in having your dependencies upgraded more > often, > > rather than less - especially in complex systems where dependency-hell > > creeps-in. If you look at Airflow's setup.py now - It's a bit scary to > make > > any change to it. There is a chance it will blow at your face if you > change > > it. You never know why there is 0.3 < ver < 1.0 - and if you change it, > > whether it will cause chain reaction of conflicts that will ruin your > work > > day. > > > > On the contrary - if you change it to exact pinning in > > .lock/requirements.txt file (poetry/pip-tools) and have much simpler (and > > commented) exclusion/avoidance rules in your .in/.tml file, the whole > setup > > might be much easier to maintain and upgrade. Every time you prepare for > > release (or even once in a while for master) one person might consciously > > attempt to upgrade all dependencies to latest ones. It should be almost > as > > easy as letting poetry/pip-tools help with figuring out what are the > latest > > set of dependencies that will work without conflicts. It should be rather > > straightforward (I've done it in the past for fairly complex systems). > What > > those tools enable is - doing single-shot upgrade of all dependencies. > > After doing it you can make sure that all tests work fine (and fix any > > problems that result from it). And then you test it thoroughly before you > > make final release. You can do it in separate PR - with automated testing > > in Travis which means that you are not disturbing work of others > > (compilation/building + unit tests are guaranteed to work before you > merge > > it) while doing it. It's all conscious rather than accidental. Nice side > > effect of that is that with every release you can actually "catch-up" > with > > latest stable versions of many libraries in one go. It's better than > > waiting until someone deliberately upgrades to newer version (and the > rest > > remain terribly out-dated as is the case for Airflow now). > > > > So a bit counterintuitively I think tools like pip-tools/poetry help you > to > > catch up faster in many cases. That is at least my experience so far. > > > > Additionally, Airflow is an open system - if you have very specific needs > > for requirements, you might actually - in the very same way with > > pip-tools/poetry - upgrade all your dependencies in your local fork of > > Airflow before someone else does it in master/release. Those tools kind > of > > democratise dependency management. It should be as easy as `pip-compile > > --upgrade` or `poetry update` and you will get all the "non-conflicting" > > latest dependencies in your local fork (and poetry especially seems to do > > all the heavy lifting of figuring out which versions will work). You > should > > be able to test and publish it locally as your private package for local > > installations. You can even mark the specific dependency you want to use > > specific version and let pip-tools/poetry figure out exact versions of > > other requirements. You can even make a PR with such upgrade eventually > to > > get it faster in master. You can even downgrade in case newer dependency > > causes problems for you in similar way. Guided by the tools, it's much > > faster than figuring the versions out by yourself. > > > > As long as we have simple way of managing it and document how to > > upgrade/downgrade dependencies in your own fork, and mention how to > locally > > release Airflow as a package, I think your case could be covered even > > better than now. What do you think ? > > > > J. > > > > On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand) > > <e...@novozymes.com.invalid> wrote: > > > > > For us, exact pinning of versions would be problematic. We have DAG > code > > > that shares direct and indirect dependencies with Airflow, e.g. lxml, > > > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our > DAG > > > code for some reason needs a newer point release due to a bug that's > > fixed, > > > then we can't cleanly build a virtual environment containing the fixed > > > version. For us, it's already a problem that Airflow has quite strict > > (and > > > sometimes old) requirements in setup.py. > > > > > > Erik > > > ________________________________ > > > From: Jarek Potiuk <jarek.pot...@polidea.com> > > > Sent: Friday, October 5, 2018 2:01:15 PM > > > To: dev@airflow.incubator.apache.org > > > Subject: Re: Pinning dependencies for Apache Airflow > > > > > > I think one solution to release approach is to check as part of > automated > > > Travis build if all requirements are pinned with == (even the deep > ones) > > > and fail the build in case they are not for ALL versions (including > > > dev). And of course we should document the approach of > releases/upgrades > > > etc. If we do it all the time for development versions (which seems > quite > > > doable), then transitively all the releases will also have pinned > > versions > > > and they will never try to upgrade any of the dependencies. In poetry > > > (similarly in pip-tools with .in file) it is done by having a .lock > file > > > that specifies exact versions of each package so it can be rather easy > to > > > manage (so it's worth trying it out I think :D - seems a bit more > > > friendly than pip-tools). > > > > > > There is a drawback - of course - with manually updating the module > that > > > you want, but I really see that as an advantage rather than drawback > > > especially for users. This way you maintain the property that it will > > > always install and work the same way no matter if you installed it > today > > or > > > two months ago. I think the biggest drawback for maintainers is that > you > > > need some kind of monitoring of security vulnerabilities and cannot > rely > > on > > > automated security upgrades. With >= requirements those security > updates > > > might happen automatically without anyone noticing, but to be honest I > > > don't think such upgrades are guaranteed even in current setup for all > > > security issues for all libraries anyway. > > > > > > Finding the need to upgrade because of security issues can be quite > > > automated. Even now I noticed Github started to inform owners about > > > potential security vulnerabilities in used libraries for their project. > > > Those notifications can be sent to devlist and turned into JIRA issues > > > followed bvy minor security-related releases (with only few library > > > dependencies upgraded). > > > > > > I think it's even easier to automate it if you have pinned > dependencies - > > > because it's generally easy to find applicable vulnerabilities for > > specific > > > versions of libraries by static analysers - when you have >=, you never > > > know which version will be used until you actually perform the > > > installation. > > > > > > There is one big advantage for maintainers for "pinned" case. Your > users > > > always have the same dependencies - so when issue is raised, you can > > > reproduce it more easily. It's hard to know which version user has (as > > the > > > user could install it month ago or yesterday) and even if you find out > by > > > asking the user, you might not be able to reproduce the set of > > requirements > > > easily (simply because there are already newer versions of the > libraries > > > released and they are used automatically). You can ask the user to run > > pip > > > --upgrade but that's dangerous and pretty lame ("check the latest > > version - > > > maybe it fixes your problem ? ") and sometimes not possible (e.g. > someone > > > has pre-built docker image with dependencies from few months ago and > > cannot > > > rebuild the image easily). > > > > > > J. > > > > > > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <a...@apache.org> > > wrote: > > > > > > > One thing to point out here. > > > > > > > > Right now if you `pip install apache-airflow=1.10.0` in a clean > > > > environment it will fail. > > > > > > > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is > >= > > > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3. > > > > > > > > So I do think there is maybe something to be said about pinning for > > > > releases. The down side to that is that if there are updates to a > > module > > > > that we want then we have to make a point release to let people get > it > > > > > > > > Both methods have draw-backs > > > > > > > > -ash > > > > > > > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <arthur.wied...@gmail.com> > > > > wrote: > > > > > > > > > > Hi Jarek, > > > > > > > > > > I will +1 the discussion Dan is referring to and George's advice. > > > > > > > > > > I just want to double check we are talking about pinning in > > > > > requirements.txt only. > > > > > > > > > > This offers the ability to > > > > > pip install -r requirements.txt > > > > > pip install --no-deps airflow > > > > > For a guaranteed install which works. > > > > > > > > > > Several different requirement files can be provided for specific > use > > > > cases, > > > > > like a stable dev one for instance for people wanting to work on > > > > operators > > > > > and non-core functions. > > > > > > > > > > However, I think we should proactively test in CI against unpinned > > > > > dependencies (though it might be a separate case in the matrix) , > so > > > that > > > > > we get advance warning if possible that things will break. > > > > > CI downtime is not a bad thing here, it actually caught a problem > :) > > > > > > > > > > We should unpin as possible in setup.py to only maintain minimum > > > required > > > > > compatibility. The process of pinning in setup.py is extremely > > > > detrimental > > > > > when you have a large number of python libraries installed with > > > different > > > > > pinned versions. > > > > > > > > > > Best, > > > > > Arthur > > > > > > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov > > > <ddavy...@twitter.com.invalid > > > > > > > > > > wrote: > > > > > > > > > >> Relevant discussion about this: > > > > >> > > > > >> > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&reserved=0 > > > > >> > > > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk < > > > jarek.pot...@polidea.com> > > > > >> wrote: > > > > >> > > > > >>> TL;DR; A change is coming in the way how > dependencies/requirements > > > are > > > > >>> specified for Apache Airflow - they will be fixed rather than > > > flexible > > > > >> (== > > > > >>> rather than >=). > > > > >>> > > > > >>> This is follow up after Slack discussion we had with Ash and > Kaxil > > - > > > > >>> summarising what we propose we'll do. > > > > >>> > > > > >>> *Problem:* > > > > >>> During last few weeks we experienced quite a few downtimes of > > > TravisCI > > > > >>> builds (for all PRs/branches including master) as some of the > > > > transitive > > > > >>> dependencies were automatically upgraded. This because in a > number > > of > > > > >>> dependencies we have >= rather than == dependencies. > > > > >>> > > > > >>> Whenever there is a new release of such dependency, it might > cause > > > > chain > > > > >>> reaction with upgrade of transitive dependencies which might get > > into > > > > >>> conflict. > > > > >>> > > > > >>> An example was Flask-AppBuilder vs flask-login transitive > > dependency > > > > with > > > > >>> click. They started to conflict once AppBuilder has released > > version > > > > >>> 1.12.0. > > > > >>> > > > > >>> *Diagnosis:* > > > > >>> Transitive dependencies with "flexible" versions (where >= is > used > > > > >> instead > > > > >>> of ==) is a reason for "dependency hell". We will sooner or later > > hit > > > > >> other > > > > >>> cases where not fixed dependencies cause similar problems with > > other > > > > >>> transitive dependencies. We need to fix-pin them. This causes > > > problems > > > > >> for > > > > >>> both - released versions (cause they stop to work!) and for > > > development > > > > >>> (cause they break master builds in TravisCI and prevent people > from > > > > >>> installing development environment from the scratch. > > > > >>> > > > > >>> *Solution:* > > > > >>> > > > > >>> - Following the old-but-good post > > > > >>> > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&reserved=0 > > > we are going to fix the > > > > >>> pinned > > > > >>> dependencies to specific versions (so basically all > dependencies > > > are > > > > >>> "fixed"). > > > > >>> - We will introduce mechanism to be able to upgrade > dependencies > > > with > > > > >>> pip-tools ( > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&reserved=0 > > ). > > > We might also > > > > >> take a > > > > >>> look at pipenv: > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&reserved=0 > > > > >>> - People who would like to upgrade some dependencies for their > > PRs > > > > >> will > > > > >>> still be able to do it - but such upgrades will be in their PR > > thus > > > > >> they > > > > >>> will go through TravisCI tests and they will also have to be > > > > specified > > > > >>> with > > > > >>> pinned fixed versions (==). This should be part of review > process > > > to > > > > >>> make > > > > >>> sure new/changed requirements are pinned. > > > > >>> - In release process there will be a point where an upgrade > will > > be > > > > >>> attempted for all requirements (using pip-tools) so that we are > > not > > > > >>> stuck > > > > >>> with older releases. This will be in controlled PR environment > > > where > > > > >>> there > > > > >>> will be time to fix all dependencies without impacting others > and > > > > >> likely > > > > >>> enough time to "vet" such changes (this can be done for > > alpha/beta > > > > >>> releases > > > > >>> for example). > > > > >>> - As a side effect dependencies specification will become far > > > simpler > > > > >>> and straightforward. > > > > >>> > > > > >>> Happy to hear community comments to the proposal. I am happy to > > take > > > a > > > > >> lead > > > > >>> on that, open JIRA issue and implement if this is something > > community > > > > is > > > > >>> happy with. > > > > >>> > > > > >>> J. > > > > >>> > > > > >>> -- > > > > >>> > > > > >>> *Jarek Potiuk, Principal Software Engineer* > > > > >>> Mobile: +48 660 796 129 > > > > >>> > > > > >> > > > > > > > > > > > > > > -- > > > > > > *Jarek Potiuk, Principal Software Engineer* > > > Mobile: +48 660 796 129 > > > > > > > > > -- > > > > *Jarek Potiuk, Principal Software Engineer* > > Mobile: +48 660 796 129 > > > -- *Jarek Potiuk, Principal Software Engineer* Mobile: +48 660 796 129