On Wed, Feb 5, 2020 at 11:31 AM Ash Berlin-Taylor <[email protected]> wrote:
> Are you talking about/test with making _ALL_ packages implicit namespaces? > > I would think that we would only make airflow.providers an implicit > package, but leave all the sub packages as explicit packages: i.e. this: > Nope. Pretty much all of the problems (except the email, kerberos and sqlachemy) are caused by implicit providers packages. But I think it's good to change the names of modules in this case to not be the same as top-level package names anyway. It's nice that implicit packages helped to detect it. I think if we are making providers packages implicit, it makes perfect sense to make all of them implicit where possible (i.e. where we have empty _init.py). That makes a lot of sense IMHO and frees our brains from making yet another decision. For almost everything in core it's just __init__.py removal and everything works. I think in this case consistency is good. Also we have all the protection to keep it correct in the future. All the automation we have in CI will detect and clearly report if anyone adds duplicated module, import package that has the same name as module etc. so we are very well covered here. J. > Does limiting the implicit namespace to just airflow.providers address > 1+2+3? > -a > On Feb 5 2020, at 10:27 am, Jarek Potiuk <[email protected]> wrote: > > TL;DR; I am asking for opinions on some of the changes required to > introduce implicit native package support. This is the easiest way to make > backports of master providers packages to 1.10.x. > > > > > > NOTE!: @Ash Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong > (mailto:[email protected]) @Felix Uellendall (mailto:[email protected]) > @Kamil Breguła (mailto:[email protected]) @Kaxil Naik (mailto: > [email protected]) @[email protected] (mailto: > [email protected]) @Philippe Gagnon (mailto: > [email protected]) - there is one point 1a) below that I would like > to get your input. > > > > As AIP-21 (import paths) import path move is done we have an important > step to complete - we should switch to implicit "native" packages. It boils > down to removal of empty (only with comment licences) __init__.py files and > relying on python 3.3+ capability of finding packages without having to add > __init_.py files. This is needed in order to get backported packaged to be > prepared for 1.10.* series. I worked on it during the last few days - to > make sure this can be done and I am close to have it. Close enough to know > I can solve all the remaining problems. I wanted to check that we can do it > for almost all the packages. I already solved most of the initial problems > but I have some places where I think I need community opinion on the way we > should solve them: > > > > The PR that has the changes is here: > https://github.com/apache/airflow/pull/7279/files - it's not 100% ready > yet and I want to split it into a few separate PRs so do not comment it > there yet - I extracted the most important changes here and wanted to ask > your opinion where I have doubts: > > > > 1) Module names identical to some top-level package names (for example > email.py): > > > > Some of the module names (for example email.py) has to be changed. The > problem with implicit packages is that if local module is named the same as > the top level package, importing the top-level package from it does not > work: > > > > email.py: > > ... > > import email <- this won't work - module imports itself. > > > > For now in my PR i renamed the modules to be airflow_<old_name>.py or > <old_name>_utils.py : There are such modules (mostly hooks): > > email -> email_utils, (utils) > > > > cassandra -> airflow_cassandra > > > > cloudant -> airflow_cloudant > > > > dataproc -> airflow_dataproc > > > > grpc -> airlfow_grpc > > > > hdfs -> airflow_hdfs > > > > jira -> airflow_jira > > > > kerberos -> kerberos_security > > > > redis -> airflow_redis > > > > sendgrid -> airflow_sendgrid > > > > snowflake -> airflow_snowflake > > > > winrm -> airflow_winrm (operator) > > > > datadog -> airflow_datatog(sensor) > > > > sqlalchemy -> sqlalchemy_utils (utils) > > > > > > > > I can also change it to grpc_hooks.py, datadog_sensors.py etc. Or maybe > someone knows an easy way how to keep the module name and implicit packages? > > > > I guess this is not a big problem to change it this way - we anyhow have > changed import paths for those. The only two problems with this are: > > a) email was used in "email_backend" configuration : email_backend = > airflow.utils.email.send_email_smtp > > but we can solve it by handling that as special case, raising > deprecation warning and converting it to > airflow.utils.email_utils.send_email_smtp > > b) we introduce slight inconsistency in hook/operator/sensor names. > > > > 2) Mypy fails when checking individual (duplicated) module names > > > > Mypy does not work well with duplicate module names in implicit packages > (https://github.com/pre-commit/mirrors-mypy/issues/5) and with AIP-21 we > have plenty of them. Repeating modules (http.py in both operators and hooks > for example) cause it to fail in case they are provided as parameters (in > case of incremental mypy check in pre-commits. I had to change it then so > that mypy always check 'airflow' and 'test' packages instead - which makes > pre-commit check slightly slower (few seconds). > > > > 3) _hooks.py/_operators.py/_sensors.py ( > http://hooks.py/_operators.py/_sensors.py) suffix: Some thought resulting > from above 1) and 2): > > > > I did not want to open pandora's box again but I think removal of > _hooks, _operators, _sensors from the module name might not have been the > best decision. While it removes some duplication in module name, it > actually introduces duplicate module names and (as it is with mypy) it > might be a problem in the future. I think now we should change it back to > add the suffixes. If we want to reverse it - this is the last moment we can > do it (and we still can easily). I would love some quick opinion/voting for > that - especially those who voted in AIP-21 but others are welcome as well: > @Ash Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong (mailto: > [email protected]) @Felix Uellendall (mailto:[email protected]) @Kamil > Breguła (mailto:[email protected]) @Kaxil Naik (mailto: > [email protected]) @[email protected] (mailto: > [email protected]) @Philippe Gagnon (mailto: > [email protected]) - this was "Case 2" in the original AIP-21 > proposal. > > > > WDYT? > > > > 4) Test module names: > > > > I had to change all test module name for providers. For example: . > Pytest does not like repeating implicit test module names. It throws error > "duplicate test module). For example: > > tests/providers/http/hooks/test_http.py → > tests/providers/http/hooks/test_http_hooks.py ( > https://github.com/apache/airflow/pull/7279/files#diff-593e762ce1d79c250426a27b6fb28908 > ) > > > > > > (basically I added hooks/operators/sensors everywhere). It is easy and > fully automated and it has no impact at all on the main code base, so I > think this is a super-safe change. t The nice thing is that it makes it > easier to understand what kind of tests you are looking at immediately and. > That also hints that maybe _hooks.py, _operators.py, _sensors.py in > hooks/operators/sensors could have been a better choice (see 2 a) ) > > > > 5) Doc generation > > > > Autoapi doc generation has not yet released implicit package support (it > is in master only). However I managed to monkey-patch 1.2.1 (latest) > version to add support by cherry-picking the changes (there are literally > few methods changed there - > https://github.com/readthedocs/sphinx-autoapi/pull/178) and when new > version is released we will be able to get rid of monkey-patching . I am > still resolving a few import issues in that newest version but I am > confident I can fix or workaround all issues with it. > > I don't expect this to be a problem - it is only in the doc generation > code, not the main application code. > > > > I verified that pytest test discovery works fine, and that airflow can > be installed and works with implicit packages. Let me know what you think > about it and I would love to merge it as soon as possible (rebasing this > for a long time might be painful). > > > > J. > > > > > > > > > > > > On Sat, Nov 23, 2019 at 9:22 AM Jarek Potiuk <[email protected] > (mailto:[email protected])> wrote: > > > I've read a bit more about PEP-0420 so some more extracted information: > > > > > > - implicit namespace packages are very similar to regular packages. > The notable difference is that they can be loaded from several directories > (so you can have modules in the same package but coming from different > physical directories), so no __file__ is defined for the package (it is > still defined for modules), > > > > > > - pytest should have no problems. It has indeed ( > https://stackoverflow.com/questions/50174130/how-do-i-pytest-a-project-using-pep-420-namespace-packages) > problems with implicit packages when tests are co-located with their tested > modules (module.py, module_test.py) but we are not in this situation. > > > > > > - I do not see how it could be slower - I have not found any > benchmarking yet (but I have not found any complaints about it either). I > looked at the loading algorithm for PEP-420 - it traverses all the > directory structure available on the PYTHONPATH bo so does the "standard" > mechanism. And we would not have to parse and load all the 140 licence-only > __init__.py. The presence of module in the package is based on the presence > of "<module>.py" in the right folder. If we would have several packages > elsewhere that might become a bit slower, but I think reading files is so > much slower than scanning directories that - if anything - we will be > faster (just opening and closing those __init__.files will take quite some > time :). We can do some benchmarking before and after to be sure. > > > > > > Side comment: I am not 100% sure, but after reading PEP-420 intuition > tells me that maybe implicit packages can help in solving relative imports > problems that some of our user raised . I saw recently several users had > problems with relative imports used in DAGs. We are loading the DAGs in a > specific way and in implicit packages, the __repr__ of modules parent is > dynamic - based on the location of the file rather than based on the module > of the file, so it's likely if we encourage dag developers also to use > implicit packages, it might actually solve the relative imports case. > > > > > > J. > > > On Fri, Nov 22, 2019 at 4:29 PM Daniel Imberman < > [email protected] (mailto:[email protected])> wrote: > > > > Fine by me if it doesn’t break anything. I’m always on the side of > less code :). > > > > > > > > via Newton Mail [ > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.5&source=email_footer_2 > ] > > > > On Fri, Nov 22, 2019 at 8:12 AM, Ash Berlin-Taylor <[email protected] > (mailto:[email protected])> wrote: > > > > Probably worth it, but things to check: > > > > > > > > - if nose or pytest needs them under tests/ to find all the tests > > > > - if not having the init files slows down pythons package importer > as it searches more paths? > > > > > > > > -a > > > > > On 22 Nov 2019, at 12:06, Jarek Potiuk <[email protected] > (mailto:[email protected])> wrote: > > > > > > > > > > There is one implementation detail in AIP-21 that I continue to > have > > > > > questions about, and I wanted to ask community about it. > > > > > > > > > > Since we moved to Python 3, we have the option of using implicit > namespace > > > > > packages. > > > > > https://www.python.org/dev/peps/pep-0420/ . We have now grouping > per folder > > > > > for services, so it would require a lot more __init__.py files if > we > > > > > continue using them. > > > > > > > > > > The implicit naming boils down to not requiring __init__.py files > if no > > > > > initialisation of package is needed. We could do it consistently > for all > > > > > code which is "internal" to airflow. This means that most packages > will not > > > > > have __init__.py there will be few exceptions like 'airflow', > > > > > 'airflow.modules' and likely few others with the __init__.py. > > > > > > > > > > I did a quick check and we have only 25 _init_.py with some logic - > > > > > remaining ~ 140 could be removed as they only contain comments. > > > > > > > > > > find . -name '__init__.py' | xargs grep -le '^[^#].*' | wc -l > > > > > 25 > > > > > find . -name '__init__.py' |wc -l > > > > > 164 > > > > > > > > > > Those are the files that will be left: > > > > > ./tests/__init__.py > > > > > ./tests/utils/log/elasticmock/__init__.py > > > > > ./tests/utils/log/elasticmock/utilities/__init__.py > > > > > ./tests/models/__init__.py > > > > > ./tests/task/__init__.py > > > > > ./airflow/sensors/__init__.py > > > > > ./airflow/operators/__init__.py > > > > > ./airflow/_vendor/nvd3/__init__.py > > > > > ./airflow/_vendor/slugify/__init__.py > > > > > ./airflow/serialization/__init__.py > > > > > ./airflow/__init__.py > > > > > ./airflow/models/__init__.py > > > > > > ./airflow/www/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py > > > > > ./airflow/ti_deps/deps/__init__.py > > > > > ./airflow/macros/__init__.py > > > > > ./airflow/executors/__init__.py > > > > > ./airflow/lineage/__init__.py > > > > > ./airflow/lineage/backend/__init__.py > > > > > ./airflow/lineage/backend/atlas/__init__.py > > > > > ./airflow/hooks/__init__.py > > > > > ./airflow/task/task_runner/__init__.py > > > > > ./airflow/api/__init__.py > > > > > ./airflow/api/common/experimental/__init__.py > > > > > ./airflow/api/client/__init__.py > > > > > ./airflow/jobs/__init__.py > > > > > > > > > > WDYT? > > > > > > > > > > J > > > > > > > > > > -- > > > > > > > > > > Jarek Potiuk > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > > > -- > > > > > > > > > Jarek Potiuk > > > Polidea (https://www.polidea.com/) | Principal Software Engineer > > > > > > > > > > > > > > > > > > > > > > > > M: +48 660 796 129 (tel:+48660796129) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Jarek Potiuk > > Polidea (https://www.polidea.com/) | Principal Software Engineer > > > > > > > > > > > > > > > > M: +48 660 796 129 (tel:+48660796129) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
