potiuk opened a new issue #12692:
URL: https://github.com/apache/airflow/issues/12692


   The tests we run in CI had shown that provider discovery based  on 
entry_points is rather brittle. 
   
   Example here:
   
   
https://github.com/apache/airflow/pull/12466/checks?check_run_id=1467792592#step:9:4452
   
   This is not a problem with Airflow, but wth PIP which might silently upgrade 
some packages and cause "version conflict" totally independently from Airflow 
configuration and totally out-of-our-control. 
   
   Simple installing a whl package on top of the existing airflow installation 
(as it happened in the case above) might cause inconsistent requirements (in 
the case above installing .whl packages with all providers on top of existing 
Airflow installation caused the requirement package to be upgraded even if 
airflow has the right requirements set. In this case it was (correct and it is 
from the "install_requires" section of airflow's setup.cfg):
   
   ```
   Requirement.parse('requests<2.24.0,>=2.20.0'), {'apache-airflow'}
   ```
   
   In case you have a version conflict in your env, retrieving running 
entry_point.load() from a package that has this version conflicts results with 
`pkg_resources.VersionConflict` error or 
`pkg_resources.ContextualVersionConflict) rather than returning the 
entry_point. Or at least that's what I observed so far. It's rather easy to 
reproduce. Simply install requests > 2.24.0 in the current airflow and see what 
happens.
   
   So far I could not find a way to mitigate this problem, but @ashb - since 
you have more experience with it, maybe you can find a workaround for this?
   
   I think we have few options:
   
   1) We fail 'airflow' hard if there is any Version Conflict. We have a way 
now after I've implemented ##10854  (and after @ephraimbuddy finishes the 
#12188 ) - we have a good, maintainable list of non-conflicting dependencies 
for Airflow and it's providers and we can keep that in the future thanks to 
pip-check. But I am afraid that will give a hard time to people who would like 
to install airflow with some custom dependencies (Tensorflow for example, 
depending on versions is notoriously difficult to sync with Airflow when it 
comes to dependencies). However, this is the most "Proper" solution.
   
   2)  We find a workaround for the entry_point.load(). However, I think that 
might not be possible or easy looking for example at thi  SO thread: 
https://stackoverflow.com/questions/52982603/python-entry-point-fails-for-dependency-conflict
 . The most upvoted (=1) answer there starts with "Welcome to the world of 
dependencies hell! I know no clean way to solve this" - which is not very 
encouraging. I tried also to find it out from docs and code of the 
entry_point.load() but to no avail. @ashb - maybe you can help here.
   
   3) We go back to the original implementation of mine where I read provider 
info from provider.yaml embedded into the package. This has disadvantage of 
being non-standard, but it works independently of version conflicts.
   
   
   WDYT?
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to