Are you talking about/test with making _ALL_ packages implicit namespaces?

I would think that we would only make airflow.providers an implicit package, 
but leave all the sub packages as explicit packages: i.e. this:
airflow/__init__.py
airflow/utils/__init__.py
airflow/providers/ (no __init__.py)
airflow/providers/google/__init__.py

Does limiting the implicit namespace to just airflow.providers address 1+2+3?
-a
On Feb 5 2020, at 10:27 am, Jarek Potiuk <[email protected]> wrote:
> TL;DR; I am asking for opinions on some of the changes required to introduce 
> implicit native package support. This is the easiest way to make backports of 
> master providers packages to 1.10.x.
>
>
> NOTE!: @Ash Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong 
> (mailto:[email protected]) @Felix Uellendall (mailto:[email protected]) 
> @Kamil Breguła (mailto:[email protected]) @Kaxil Naik 
> (mailto:[email protected]) @[email protected] 
> (mailto:[email protected]) @Philippe Gagnon 
> (mailto:[email protected]) - there is one point 1a) below that I would 
> like to get your input.
>
> As AIP-21 (import paths) import path move is done we have an important step 
> to complete - we should switch to implicit "native" packages. It boils down 
> to removal of empty (only with comment licences) __init__.py files and 
> relying on python 3.3+ capability of finding packages without having to add 
> __init_.py files. This is needed in order to get backported packaged to be 
> prepared for 1.10.* series. I worked on it during the last few days - to make 
> sure this can be done and I am close to have it. Close enough to know I can 
> solve all the remaining problems. I wanted to check that we can do it for 
> almost all the packages. I already solved most of the initial problems but I 
> have some places where I think I need community opinion on the way we should 
> solve them:
>
> The PR that has the changes is here: 
> https://github.com/apache/airflow/pull/7279/files - it's not 100% ready yet 
> and I want to split it into a few separate PRs so do not comment it there yet 
> - I extracted the most important changes here and wanted to ask your opinion 
> where I have doubts:
>
> 1) Module names identical to some top-level package names (for example 
> email.py):
>
> Some of the module names (for example email.py) has to be changed. The 
> problem with implicit packages is that if local module is named the same as 
> the top level package, importing the top-level package from it does not work:
>
> email.py:
> ...
> import email <- this won't work - module imports itself.
>
> For now in my PR i renamed the modules to be airflow_<old_name>.py or 
> <old_name>_utils.py : There are such modules (mostly hooks):
> email -> email_utils, (utils)
>
> cassandra -> airflow_cassandra
>
> cloudant -> airflow_cloudant
>
> dataproc -> airflow_dataproc
>
> grpc -> airlfow_grpc
>
> hdfs -> airflow_hdfs
>
> jira -> airflow_jira
>
> kerberos -> kerberos_security
>
> redis -> airflow_redis
>
> sendgrid -> airflow_sendgrid
>
> snowflake -> airflow_snowflake
>
> winrm -> airflow_winrm (operator)
>
> datadog -> airflow_datatog(sensor)
>
> sqlalchemy -> sqlalchemy_utils (utils)
>
>
>
> I can also change it to grpc_hooks.py, datadog_sensors.py etc. Or maybe 
> someone knows an easy way how to keep the module name and implicit packages?
>
> I guess this is not a big problem to change it this way - we anyhow have 
> changed import paths for those. The only two problems with this are:
> a) email was used in "email_backend" configuration : email_backend = 
> airflow.utils.email.send_email_smtp
> but we can solve it by handling that as special case, raising deprecation 
> warning and converting it to airflow.utils.email_utils.send_email_smtp
> b) we introduce slight inconsistency in hook/operator/sensor names.
>
> 2) Mypy fails when checking individual (duplicated) module names
>
> Mypy does not work well with duplicate module names in implicit packages 
> (https://github.com/pre-commit/mirrors-mypy/issues/5) and with AIP-21 we have 
> plenty of them. Repeating modules (http.py in both operators and hooks for 
> example) cause it to fail in case they are provided as parameters (in case of 
> incremental mypy check in pre-commits. I had to change it then so that mypy 
> always check 'airflow' and 'test' packages instead - which makes pre-commit 
> check slightly slower (few seconds).
>
> 3) _hooks.py/_operators.py/_sensors.py 
> (http://hooks.py/_operators.py/_sensors.py) suffix: Some thought resulting 
> from above 1) and 2):
>
> I did not want to open pandora's box again but I think removal of _hooks, 
> _operators, _sensors from the module name might not have been the best 
> decision. While it removes some duplication in module name, it actually 
> introduces duplicate module names and (as it is with mypy) it might be a 
> problem in the future. I think now we should change it back to add the 
> suffixes. If we want to reverse it - this is the last moment we can do it 
> (and we still can easily). I would love some quick opinion/voting for that - 
> especially those who voted in AIP-21 but others are welcome as well: @Ash 
> Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong 
> (mailto:[email protected]) @Felix Uellendall (mailto:[email protected]) 
> @Kamil Breguła (mailto:[email protected]) @Kaxil Naik 
> (mailto:[email protected]) @[email protected] 
> (mailto:[email protected]) @Philippe Gagnon 
> (mailto:[email protected]) - this was "Case 2" in the original AIP-21 
> proposal.
>
> WDYT?
>
> 4) Test module names:
>
> I had to change all test module name for providers. For example: . Pytest 
> does not like repeating implicit test module names. It throws error 
> "duplicate test module). For example:
> tests/providers/http/hooks/test_http.py → 
> tests/providers/http/hooks/test_http_hooks.py 
> (https://github.com/apache/airflow/pull/7279/files#diff-593e762ce1d79c250426a27b6fb28908)
>
>
> (basically I added hooks/operators/sensors everywhere). It is easy and fully 
> automated and it has no impact at all on the main code base, so I think this 
> is a super-safe change. t The nice thing is that it makes it easier to 
> understand what kind of tests you are looking at immediately and. That also 
> hints that maybe _hooks.py, _operators.py, _sensors.py in 
> hooks/operators/sensors could have been a better choice (see 2 a) )
>
> 5) Doc generation
>
> Autoapi doc generation has not yet released implicit package support (it is 
> in master only). However I managed to monkey-patch 1.2.1 (latest) version to 
> add support by cherry-picking the changes (there are literally few methods 
> changed there - https://github.com/readthedocs/sphinx-autoapi/pull/178) and 
> when new version is released we will be able to get rid of monkey-patching . 
> I am still resolving a few import issues in that newest version but I am 
> confident I can fix or workaround all issues with it.
> I don't expect this to be a problem - it is only in the doc generation code, 
> not the main application code.
>
> I verified that pytest test discovery works fine, and that airflow can be 
> installed and works with implicit packages. Let me know what you think about 
> it and I would love to merge it as soon as possible (rebasing this for a long 
> time might be painful).
>
> J.
>
>
>
>
>
> On Sat, Nov 23, 2019 at 9:22 AM Jarek Potiuk <[email protected] 
> (mailto:[email protected])> wrote:
> > I've read a bit more about PEP-0420 so some more extracted information:
> >
> > - implicit namespace packages are very similar to regular packages. The 
> > notable difference is that they can be loaded from several directories (so 
> > you can have modules in the same package but coming from different physical 
> > directories), so no __file__ is defined for the package (it is still 
> > defined for modules),
> >
> > - pytest should have no problems. It has indeed 
> > (https://stackoverflow.com/questions/50174130/how-do-i-pytest-a-project-using-pep-420-namespace-packages)
> >  problems with implicit packages when tests are co-located with their 
> > tested modules (module.py, module_test.py) but we are not in this situation.
> >
> > - I do not see how it could be slower - I have not found any benchmarking 
> > yet (but I have not found any complaints about it either). I looked at the 
> > loading algorithm for PEP-420 - it traverses all the directory structure 
> > available on the PYTHONPATH bo so does the "standard" mechanism. And we 
> > would not have to parse and load all the 140 licence-only __init__.py. The 
> > presence of module in the package is based on the presence of "<module>.py" 
> > in the right folder. If we would have several packages elsewhere that might 
> > become a bit slower, but I think reading files is so much slower than 
> > scanning directories that - if anything - we will be faster (just opening 
> > and closing those __init__.files will take quite some time :). We can do 
> > some benchmarking before and after to be sure.
> >
> > Side comment: I am not 100% sure, but after reading PEP-420 intuition tells 
> > me that maybe implicit packages can help in solving relative imports 
> > problems that some of our user raised . I saw recently several users had 
> > problems with relative imports used in DAGs. We are loading the DAGs in a 
> > specific way and in implicit packages, the __repr__ of modules parent is 
> > dynamic - based on the location of the file rather than based on the module 
> > of the file, so it's likely if we encourage dag developers also to use 
> > implicit packages, it might actually solve the relative imports case.
> >
> > J.
> > On Fri, Nov 22, 2019 at 4:29 PM Daniel Imberman <[email protected] 
> > (mailto:[email protected])> wrote:
> > > Fine by me if it doesn’t break anything. I’m always on the side of less 
> > > code :).
> > >
> > > via Newton Mail 
> > > [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.5&source=email_footer_2]
> > > On Fri, Nov 22, 2019 at 8:12 AM, Ash Berlin-Taylor <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > Probably worth it, but things to check:
> > >
> > > - if nose or pytest needs them under tests/ to find all the tests
> > > - if not having the init files slows down pythons package importer as it 
> > > searches more paths?
> > >
> > > -a
> > > > On 22 Nov 2019, at 12:06, Jarek Potiuk <[email protected] 
> > > > (mailto:[email protected])> wrote:
> > > >
> > > > There is one implementation detail in AIP-21 that I continue to have
> > > > questions about, and I wanted to ask community about it.
> > > >
> > > > Since we moved to Python 3, we have the option of using implicit 
> > > > namespace
> > > > packages.
> > > > https://www.python.org/dev/peps/pep-0420/ . We have now grouping per 
> > > > folder
> > > > for services, so it would require a lot more __init__.py files if we
> > > > continue using them.
> > > >
> > > > The implicit naming boils down to not requiring __init__.py files if no
> > > > initialisation of package is needed. We could do it consistently for all
> > > > code which is "internal" to airflow. This means that most packages will 
> > > > not
> > > > have __init__.py there will be few exceptions like 'airflow',
> > > > 'airflow.modules' and likely few others with the __init__.py.
> > > >
> > > > I did a quick check and we have only 25 _init_.py with some logic -
> > > > remaining ~ 140 could be removed as they only contain comments.
> > > >
> > > > find . -name '__init__.py' | xargs grep -le '^[^#].*' | wc -l
> > > > 25
> > > > find . -name '__init__.py' |wc -l
> > > > 164
> > > >
> > > > Those are the files that will be left:
> > > > ./tests/__init__.py
> > > > ./tests/utils/log/elasticmock/__init__.py
> > > > ./tests/utils/log/elasticmock/utilities/__init__.py
> > > > ./tests/models/__init__.py
> > > > ./tests/task/__init__.py
> > > > ./airflow/sensors/__init__.py
> > > > ./airflow/operators/__init__.py
> > > > ./airflow/_vendor/nvd3/__init__.py
> > > > ./airflow/_vendor/slugify/__init__.py
> > > > ./airflow/serialization/__init__.py
> > > > ./airflow/__init__.py
> > > > ./airflow/models/__init__.py
> > > > ./airflow/www/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py
> > > > ./airflow/ti_deps/deps/__init__.py
> > > > ./airflow/macros/__init__.py
> > > > ./airflow/executors/__init__.py
> > > > ./airflow/lineage/__init__.py
> > > > ./airflow/lineage/backend/__init__.py
> > > > ./airflow/lineage/backend/atlas/__init__.py
> > > > ./airflow/hooks/__init__.py
> > > > ./airflow/task/task_runner/__init__.py
> > > > ./airflow/api/__init__.py
> > > > ./airflow/api/common/experimental/__init__.py
> > > > ./airflow/api/client/__init__.py
> > > > ./airflow/jobs/__init__.py
> > > >
> > > > WDYT?
> > > >
> > > > J
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> >
> >
> >
> >
> > --
> >
> >
> > Jarek Potiuk
> > Polidea (https://www.polidea.com/) | Principal Software Engineer
> >
> >
> >
> >
> >
> >
> >
> > M: +48 660 796 129 (tel:+48660796129)
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
>
>
> Jarek Potiuk
> Polidea (https://www.polidea.com/) | Principal Software Engineer
>
>
>
>
>
>
>
> M: +48 660 796 129 (tel:+48660796129)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to