There's two things I don't like: 1 the name, and that two it's more work than we are needed for (to what I can see) no gain. Making everything an implicit package touches 776 files, when 1 is all that is needed.
To make airflow.providers an implicit namepsace package we need to remove a single file. Doing just that mypy doesn't complain about anything. From a few quick spot tests they still pass too. There is nothing to think about again as everywhere else we follow the normal python convention of having an __init__.py -ash On Feb 5 2020, at 10:48 am, Jarek Potiuk <[email protected]> wrote: > Hey Ash, > > What do you think is the downside of changing all the packages to implicit > Except the -6000 or so useless comments in empty __init__.py ? > > J. > On Wed, Feb 5, 2020 at 11:34 AM Ash Berlin-Taylor <[email protected]> wrote: > > And to be clear: I'd rather we didn't make everything an implicit > > namespace package, but only the "official" extension points of > > airflow.providers. I cant immediately think of a reason to make anything > > else a namespace package, and limiting this to a single place makes the > > change smaller (tiny even?) and also easier to reason about where modules > > might be coming from. > > > > What have I not thought about? > > On Feb 5 2020, at 10:31 am, Ash Berlin-Taylor <[email protected]> wrote: > > > Are you talking about/test with making _ALL_ packages implicit > > > > namespaces? > > > > > > I would think that we would only make airflow.providers an implicit > > package, but leave all the sub packages as explicit packages: i.e. this: > > > airflow/__init__.py > > > airflow/utils/__init__.py > > > airflow/providers/ (no __init__.py) > > > airflow/providers/google/__init__.py > > > > > > Does limiting the implicit namespace to just airflow.providers address > > 1+2+3? > > > -a > > > On Feb 5 2020, at 10:27 am, Jarek Potiuk <[email protected]> > > > > wrote: > > > > TL;DR; I am asking for opinions on some of the changes required to > > > > > > > introduce implicit native package support. This is the easiest way to make > > backports of master providers packages to 1.10.x. > > > > > > > > > > > > NOTE!: @Ash Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong > > (mailto:[email protected]) @Felix Uellendall (mailto:[email protected]) > > @Kamil Breguła (mailto:[email protected]) @Kaxil Naik (mailto: > > [email protected]) @[email protected] (mailto: > > [email protected]) @Philippe Gagnon (mailto: > > [email protected]) - there is one point 1a) below that I would like > > to get your input. > > > > > > > > As AIP-21 (import paths) import path move is done we have an important > > step to complete - we should switch to implicit "native" packages. It boils > > down to removal of empty (only with comment licences) __init__.py files and > > relying on python 3.3+ capability of finding packages without having to add > > __init_.py files. This is needed in order to get backported packaged to be > > prepared for 1.10.* series. I worked on it during the last few days - to > > make sure this can be done and I am close to have it. Close enough to know > > I can solve all the remaining problems. I wanted to check that we can do it > > for almost all the packages. I already solved most of the initial problems > > but I have some places where I think I need community opinion on the way we > > should solve them: > > > > > > > > The PR that has the changes is here: > > https://github.com/apache/airflow/pull/7279/files - it's not 100% ready > > yet and I want to split it into a few separate PRs so do not comment it > > there yet - I extracted the most important changes here and wanted to ask > > your opinion where I have doubts: > > > > > > > > 1) Module names identical to some top-level package names (for example > > email.py): > > > > > > > > Some of the module names (for example email.py) has to be changed. The > > problem with implicit packages is that if local module is named the same as > > the top level package, importing the top-level package from it does not > > work: > > > > > > > > email.py: > > > > ... > > > > import email <- this won't work - module imports itself. > > > > > > > > For now in my PR i renamed the modules to be airflow_<old_name>.py or > > > > <old_name>_utils.py : There are such modules (mostly hooks): > > > > email -> email_utils, (utils) > > > > > > > > cassandra -> airflow_cassandra > > > > cloudant -> airflow_cloudant > > > > dataproc -> airflow_dataproc > > > > grpc -> airlfow_grpc > > > > hdfs -> airflow_hdfs > > > > jira -> airflow_jira > > > > kerberos -> kerberos_security > > > > redis -> airflow_redis > > > > sendgrid -> airflow_sendgrid > > > > snowflake -> airflow_snowflake > > > > winrm -> airflow_winrm (operator) > > > > datadog -> airflow_datatog(sensor) > > > > sqlalchemy -> sqlalchemy_utils (utils) > > > > > > > > > > > > > > > > I can also change it to grpc_hooks.py, datadog_sensors.py etc. Or > > maybe someone knows an easy way how to keep the module name and implicit > > packages? > > > > > > > > I guess this is not a big problem to change it this way - we anyhow > > have changed import paths for those. The only two problems with this are: > > > > a) email was used in "email_backend" configuration : email_backend = > > > > > > > airflow.utils.email.send_email_smtp > > > > but we can solve it by handling that as special case, raising > > > > > > > deprecation warning and converting it to > > airflow.utils.email_utils.send_email_smtp > > > > b) we introduce slight inconsistency in hook/operator/sensor names. > > > > > > > > 2) Mypy fails when checking individual (duplicated) module names > > > > Mypy does not work well with duplicate module names in implicit > > packages (https://github.com/pre-commit/mirrors-mypy/issues/5) and with > > AIP-21 we have plenty of them. Repeating modules (http.py in both operators > > and hooks for example) cause it to fail in case they are provided as > > parameters (in case of incremental mypy check in pre-commits. I had to > > change it then so that mypy always check 'airflow' and 'test' packages > > instead - which makes pre-commit check slightly slower (few seconds). > > > > > > > > 3) _hooks.py/_operators.py/_sensors.py ( > > http://hooks.py/_operators.py/_sensors.py) suffix: Some thought resulting > > from above 1) and 2): > > > > > > > > I did not want to open pandora's box again but I think removal of > > _hooks, _operators, _sensors from the module name might not have been the > > best decision. While it removes some duplication in module name, it > > actually introduces duplicate module names and (as it is with mypy) it > > might be a problem in the future. I think now we should change it back to > > add the suffixes. If we want to reverse it - this is the last moment we can > > do it (and we still can easily). I would love some quick opinion/voting for > > that - especially those who voted in AIP-21 but others are welcome as well: > > @Ash Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong (mailto: > > [email protected]) @Felix Uellendall (mailto:[email protected]) @Kamil > > Breguła (mailto:[email protected]) @Kaxil Naik (mailto: > > [email protected]) @[email protected] (mailto: > > [email protected]) @Philippe Gagnon (mailto: > > [email protected]) - this was "Case 2" in the original AIP-21 > > proposal. > > > > > > > > WDYT? > > > > 4) Test module names: > > > > I had to change all test module name for providers. For example: . > > Pytest does not like repeating implicit test module names. It throws error > > "duplicate test module). For example: > > > > tests/providers/http/hooks/test_http.py → > > > > > > > tests/providers/http/hooks/test_http_hooks.py ( > > https://github.com/apache/airflow/pull/7279/files#diff-593e762ce1d79c250426a27b6fb28908 > > ) > > > > > > > > > > > > (basically I added hooks/operators/sensors everywhere). It is easy and > > fully automated and it has no impact at all on the main code base, so I > > think this is a super-safe change. t The nice thing is that it makes it > > easier to understand what kind of tests you are looking at immediately and. > > That also hints that maybe _hooks.py, _operators.py, _sensors.py in > > hooks/operators/sensors could have been a better choice (see 2 a) ) > > > > > > > > 5) Doc generation > > > > Autoapi doc generation has not yet released implicit package support > > (it is in master only). However I managed to monkey-patch 1.2.1 (latest) > > version to add support by cherry-picking the changes (there are literally > > few methods changed there - > > https://github.com/readthedocs/sphinx-autoapi/pull/178) and when new > > version is released we will be able to get rid of monkey-patching . I am > > still resolving a few import issues in that newest version but I am > > confident I can fix or workaround all issues with it. > > > > I don't expect this to be a problem - it is only in the doc generation > > > > > > > code, not the main application code. > > > > > > > > I verified that pytest test discovery works fine, and that airflow can > > be installed and works with implicit packages. Let me know what you think > > about it and I would love to merge it as soon as possible (rebasing this > > for a long time might be painful). > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > On Sat, Nov 23, 2019 at 9:22 AM Jarek Potiuk <[email protected] > > (mailto:[email protected])> wrote: > > > > > I've read a bit more about PEP-0420 so some more extracted > > > > > > > > > > > information: > > > > > > > > > > - implicit namespace packages are very similar to regular packages. > > The notable difference is that they can be loaded from several directories > > (so you can have modules in the same package but coming from different > > physical directories), so no __file__ is defined for the package (it is > > still defined for modules), > > > > > > > > > > - pytest should have no problems. It has indeed ( > > https://stackoverflow.com/questions/50174130/how-do-i-pytest-a-project-using-pep-420-namespace-packages) > > problems with implicit packages when tests are co-located with their tested > > modules (module.py, module_test.py) but we are not in this situation. > > > > > > > > > > - I do not see how it could be slower - I have not found any > > benchmarking yet (but I have not found any complaints about it either). I > > looked at the loading algorithm for PEP-420 - it traverses all the > > directory structure available on the PYTHONPATH bo so does the "standard" > > mechanism. And we would not have to parse and load all the 140 licence-only > > __init__.py. The presence of module in the package is based on the presence > > of "<module>.py" in the right folder. If we would have several packages > > elsewhere that might become a bit slower, but I think reading files is so > > much slower than scanning directories that - if anything - we will be > > faster (just opening and closing those __init__.files will take quite some > > time :). We can do some benchmarking before and after to be sure. > > > > > > > > > > Side comment: I am not 100% sure, but after reading PEP-420 > > intuition tells me that maybe implicit packages can help in solving > > relative imports problems that some of our user raised . I saw recently > > several users had problems with relative imports used in DAGs. We are > > loading the DAGs in a specific way and in implicit packages, the __repr__ > > of modules parent is dynamic - based on the location of the file rather > > than based on the module of the file, so it's likely if we encourage dag > > developers also to use implicit packages, it might actually solve the > > relative imports case. > > > > > > > > > > J. > > > > > On Fri, Nov 22, 2019 at 4:29 PM Daniel Imberman < > > > > > > > > > > > [email protected] (mailto:[email protected])> wrote: > > > > > > Fine by me if it doesn’t break anything. I’m always on the side of > > > > > > > > > > > > > > > > less code :). > > > > > > > > > > > > via Newton Mail [ > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.5&source=email_footer_2 > > ] > > > > > > On Fri, Nov 22, 2019 at 8:12 AM, Ash Berlin-Taylor <[email protected] > > > > > > > > > > > > > > > > (mailto:[email protected])> wrote: > > > > > > Probably worth it, but things to check: > > > > > > > > > > > > - if nose or pytest needs them under tests/ to find all the tests > > > > > > - if not having the init files slows down pythons package importer > > > > > > > > > > > > > > > > as it searches more paths? > > > > > > > > > > > > -a > > > > > > > On 22 Nov 2019, at 12:06, Jarek Potiuk <[email protected] > > > > > > > > > > > > > > > > > > > > > > (mailto:[email protected])> wrote: > > > > > > > > > > > > > > There is one implementation detail in AIP-21 that I continue to > > have > > > > > > > questions about, and I wanted to ask community about it. > > > > > > > > > > > > > > Since we moved to Python 3, we have the option of using implicit > > namespace > > > > > > > packages. > > > > > > > https://www.python.org/dev/peps/pep-0420/ . We have now > > > > > > > > > > > > > > > > > > > > > > grouping per folder > > > > > > > for services, so it would require a lot more __init__.py files > > > > > > > > > > > > > > > > > > > > > > if we > > > > > > > continue using them. > > > > > > > > > > > > > > The implicit naming boils down to not requiring __init__.py > > files if no > > > > > > > initialisation of package is needed. We could do it consistently > > > > > > > > > > > > > > > > > > > > > > for all > > > > > > > code which is "internal" to airflow. This means that most > > > > > > > > > > > > > > > > > > > > > > packages will not > > > > > > > have __init__.py there will be few exceptions like 'airflow', > > > > > > > 'airflow.modules' and likely few others with the __init__.py. > > > > > > > > > > > > > > I did a quick check and we have only 25 _init_.py with some > > logic - > > > > > > > remaining ~ 140 could be removed as they only contain comments. > > > > > > > > > > > > > > find . -name '__init__.py' | xargs grep -le '^[^#].*' | wc -l > > > > > > > 25 > > > > > > > find . -name '__init__.py' |wc -l > > > > > > > 164 > > > > > > > > > > > > > > Those are the files that will be left: > > > > > > > ./tests/__init__.py > > > > > > > ./tests/utils/log/elasticmock/__init__.py > > > > > > > ./tests/utils/log/elasticmock/utilities/__init__.py > > > > > > > ./tests/models/__init__.py > > > > > > > ./tests/task/__init__.py > > > > > > > ./airflow/sensors/__init__.py > > > > > > > ./airflow/operators/__init__.py > > > > > > > ./airflow/_vendor/nvd3/__init__.py > > > > > > > ./airflow/_vendor/slugify/__init__.py > > > > > > > ./airflow/serialization/__init__.py > > > > > > > ./airflow/__init__.py > > > > > > > ./airflow/models/__init__.py > > > > > > > > > ./airflow/www/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py > > > > > > > ./airflow/ti_deps/deps/__init__.py > > > > > > > ./airflow/macros/__init__.py > > > > > > > ./airflow/executors/__init__.py > > > > > > > ./airflow/lineage/__init__.py > > > > > > > ./airflow/lineage/backend/__init__.py > > > > > > > ./airflow/lineage/backend/atlas/__init__.py > > > > > > > ./airflow/hooks/__init__.py > > > > > > > ./airflow/task/task_runner/__init__.py > > > > > > > ./airflow/api/__init__.py > > > > > > > ./airflow/api/common/experimental/__init__.py > > > > > > > ./airflow/api/client/__init__.py > > > > > > > ./airflow/jobs/__init__.py > > > > > > > > > > > > > > WDYT? > > > > > > > J > > > > > > > -- > > > > > > > Jarek Potiuk > > > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Jarek Potiuk > > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > M: +48 660 796 129 (tel:+48660796129) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > M: +48 660 796 129 (tel:+48660796129) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >
