potiuk opened a new issue #12692: URL: https://github.com/apache/airflow/issues/12692
The tests we run in CI had shown that provider discovery based on entry_points is rather brittle. Example here: https://github.com/apache/airflow/pull/12466/checks?check_run_id=1467792592#step:9:4452 This is not a problem with Airflow, but wth PIP which might silently upgrade some packages and cause "version conflict" totally independently from Airflow configuration and totally out-of-our-control. Simple installing a whl package on top of the existing airflow installation (as it happened in the case above) might cause inconsistent requirements (in the case above installing .whl packages with all providers on top of existing Airflow installation caused the requirement package to be upgraded even if airflow has the right requirements set. In this case it was (correct and it is from the "install_requires" section of airflow's setup.cfg): ``` Requirement.parse('requests<2.24.0,>=2.20.0'), {'apache-airflow'} ``` In case you have a version conflict in your env, retrieving running entry_point.load() from a package that has this version conflicts results with `pkg_resources.VersionConflict` error or `pkg_resources.ContextualVersionConflict) rather than returning the entry_point. Or at least that's what I observed so far. It's rather easy to reproduce. Simply install requests > 2.24.0 in the current airflow and see what happens. So far I could not find a way to mitigate this problem, but @ashb - since you have more experience with it, maybe you can find a workaround for this? I think we have few options: 1) We fail 'airflow' hard if there is any Version Conflict. We have a way now after I've implemented ##10854 (and after @ephraimbuddy finishes the #12188 ) - we have a good, maintainable list of non-conflicting dependencies for Airflow and it's providers and we can keep that in the future thanks to pip-check. But I am afraid that will give a hard time to people who would like to install airflow with some custom dependencies (Tensorflow for example, depending on versions is notoriously difficult to sync with Airflow when it comes to dependencies). However, this is the most "Proper" solution. 2) We find a workaround for the entry_point.load(). However, I think that might not be possible or easy looking for example at thi SO thread: https://stackoverflow.com/questions/52982603/python-entry-point-fails-for-dependency-conflict . The most upvoted (=1) answer there starts with "Welcome to the world of dependencies hell! I know no clean way to solve this" - which is not very encouraging. I tried also to find it out from docs and code of the entry_point.load() but to no avail. @ashb - maybe you can help here. 3) We go back to the original implementation of mine where I read provider info from provider.yaml embedded into the package. This has disadvantage of being non-standard, but it works independently of version conflicts. WDYT? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
