I don't think such quick assessment on a "global level" with the size of
change is complete or even correct - the alternative is more than "1 file"
change.

I think we should all try to understand what is changing, the reasoning and
consequences, and not only to look at the total numbers. Particularly I do
not think "size" of change is the only or even most important criteria in
case of such refactor. It's one of the criteria IMHO, but there are others
- one of them is being forward-compatible as well and preventing/limiting
potential backwards-incompatible changes in the future.

So I would like that we - as a community - make an informed decision about
it without over-simplifications and biases. I have a kind request to those
interested in AIP-21 - can you please take a look and comment? I would not
like to be held by this for a long time, but I am super happy to discuss it
within the next few days and make voting about it (and in the meantime I
will test the "simpler" approach with just removing 1 file (and everything
that follows) and see if it works. We will then know more about what kind
and size of change we are comparing it to.

And if - as a community - we come to the conclusion that it's better to
make smaller change (and we prove that it works) - I am happy to follow
this.

Firstly - just to be precise about stats of the change:

   - it removes 389 files (__init__.py files that contain only commented
   license information) - we would not have to worry about those in the future
   - it moves 296 file that just have been moved in AIP-21 and nobody
   started using them yet. There is no problem with moving them again.
   - it updates 19 files in 'contrib' that only contain deprecation notice
   and have just been moved in AIP-21
   - it corrects single import statements or updates test-metadata, list of
   packages, pre-commit checks, using implicit packages semantics for
   importlib  etc. in the remaining 73 files

Secondly. I would not jump into conclusion that alternative is just "remove
__init__.py" from providers. It's vast over-simplification. Even if it
works for backported packages case, there are a number of changes needed to
make it work for master. Starting from setup.py, doc generation, package
preparation. Some part of the 73 files above will have to be done anyway if
we make the 'providers' package an implicit one. And I have not yet started
changes to setup.py to prepare the backport packages - I wanted to do it
after we make sure we have the "__init__.py" question solved for AIP-21.
And I think making packages implicit will make it simpler.

My tests before tested that everything works if all packages in providers
were made implicit. I do not know - because I have not tested it - if it is
going to work if I just remove __init__.py from providers. I might test it
to see if it works, and will let you know - but in the meantime I would
love your opinion on the statements above - in case we decide to move. And
even if it works, it's not the only change - for one for example we will
have to make changes to auto-api anyway because it does not work currently
with implicit packages at all. and we might find that we have to make other
changes as well.

Thirdly: Let my add my reasoning here as well why I think we should convert
everything to implicit packages now. I think it is for the benefit of our
users and limiting future backwards incompatible changes.

I think we are making an important, backwards incompatible move now
with AIP-21 (we've agreed to that already). My tests with it have shown
that IF we start using implicit packages at some point in time, there are
certain consequences that we have to deal with sooner or later. And my
point here is that if we already know that duplicated module names might be
a problem, and that we are already started using the implicit packages  -
why should we not fix everything now - to avoid another breaking change in
the future. I think this is perfect timing to make such a move now rather
than waiting for the future. We've already made a bold move with AIP-21 and
we are just about (via backporting) to make it used by our users - so we
better make sure we do not have to make them adapt twice in the near
future.

I think it's painless now, where it will be very painful if we make it in
the future.

That's the reason why I worked on detecting and solving such problems now -
to be better prepared for the discussion (and be informed about problems
caused potentially by AIP-21 move) and to make sure that what we finally
come up as AIP-21 (with backporting) is future-proof and we will not have
change it again in the near future.

If my tests will show that we can only do providers.__init__ removal and we
can make everything works for backported packages. I am going to take a
look at it now.

But I would like that as a community we decide which road we take now. I am
happy to cast a vote on it after we hear the opinions of the community.

J.



On Wed, Feb 5, 2020 at 11:58 AM Ash Berlin-Taylor <[email protected]> wrote:

> There's two things I don't like: 1 the name, and that two it's more work
> than we are needed for (to what I can see) no gain. Making everything an
> implicit package touches 776 files, when 1 is all that is needed.
>
> To make airflow.providers an implicit namepsace package we need to remove
> a single file. Doing just that mypy doesn't complain about anything. From a
> few quick spot tests they still pass too.
> There is nothing to think about again as everywhere else we follow the
> normal python convention of having an __init__.py
> -ash
> On Feb 5 2020, at 10:48 am, Jarek Potiuk <[email protected]> wrote:
> > Hey Ash,
> >
> > What do you think is the downside of changing all the packages to
> implicit
> > Except the -6000 or so useless comments in empty __init__.py ?
> >
> > J.
> > On Wed, Feb 5, 2020 at 11:34 AM Ash Berlin-Taylor <[email protected]>
> wrote:
> > > And to be clear: I'd rather we didn't make everything an implicit
> > > namespace package, but only the "official" extension points of
> > > airflow.providers. I cant immediately think of a reason to make
> anything
> > > else a namespace package, and limiting this to a single place makes the
> > > change smaller (tiny even?) and also easier to reason about where
> modules
> > > might be coming from.
> > >
> > > What have I not thought about?
> > > On Feb 5 2020, at 10:31 am, Ash Berlin-Taylor <[email protected]> wrote:
> > > > Are you talking about/test with making _ALL_ packages implicit
> > >
> > > namespaces?
> > > >
> > > > I would think that we would only make airflow.providers an implicit
> > > package, but leave all the sub packages as explicit packages: i.e.
> this:
> > > > airflow/__init__.py
> > > > airflow/utils/__init__.py
> > > > airflow/providers/ (no __init__.py)
> > > > airflow/providers/google/__init__.py
> > > >
> > > > Does limiting the implicit namespace to just airflow.providers
> address
> > > 1+2+3?
> > > > -a
> > > > On Feb 5 2020, at 10:27 am, Jarek Potiuk <[email protected]>
> > >
> > > wrote:
> > > > > TL;DR; I am asking for opinions on some of the changes required to
> > > >
> > >
> > > introduce implicit native package support. This is the easiest way to
> make
> > > backports of master providers packages to 1.10.x.
> > > > >
> > > > >
> > > > > NOTE!: @Ash Berlin-Taylor (mailto:[email protected]) @Fokko
> Driesprong
> > > (mailto:[email protected]) @Felix Uellendall (mailto:[email protected]
> )
> > > @Kamil Breguła (mailto:[email protected]) @Kaxil Naik (mailto:
> > > [email protected]) @[email protected] (mailto:
> > > [email protected]) @Philippe Gagnon (mailto:
> > > [email protected]) - there is one point 1a) below that I would
> like
> > > to get your input.
> > > > >
> > > > > As AIP-21 (import paths) import path move is done we have an
> important
> > > step to complete - we should switch to implicit "native" packages. It
> boils
> > > down to removal of empty (only with comment licences) __init__.py
> files and
> > > relying on python 3.3+ capability of finding packages without having
> to add
> > > __init_.py files. This is needed in order to get backported packaged
> to be
> > > prepared for 1.10.* series. I worked on it during the last few days -
> to
> > > make sure this can be done and I am close to have it. Close enough to
> know
> > > I can solve all the remaining problems. I wanted to check that we can
> do it
> > > for almost all the packages. I already solved most of the initial
> problems
> > > but I have some places where I think I need community opinion on the
> way we
> > > should solve them:
> > > > >
> > > > > The PR that has the changes is here:
> > > https://github.com/apache/airflow/pull/7279/files - it's not 100%
> ready
> > > yet and I want to split it into a few separate PRs so do not comment it
> > > there yet - I extracted the most important changes here and wanted to
> ask
> > > your opinion where I have doubts:
> > > > >
> > > > > 1) Module names identical to some top-level package names (for
> example
> > > email.py):
> > > > >
> > > > > Some of the module names (for example email.py) has to be changed.
> The
> > > problem with implicit packages is that if local module is named the
> same as
> > > the top level package, importing the top-level package from it does not
> > > work:
> > > > >
> > > > > email.py:
> > > > > ...
> > > > > import email <- this won't work - module imports itself.
> > > > >
> > > > > For now in my PR i renamed the modules to be airflow_<old_name>.py
> or
> > > > > <old_name>_utils.py : There are such modules (mostly hooks):
> > > > > email -> email_utils, (utils)
> > > > >
> > > > > cassandra -> airflow_cassandra
> > > > > cloudant -> airflow_cloudant
> > > > > dataproc -> airflow_dataproc
> > > > > grpc -> airlfow_grpc
> > > > > hdfs -> airflow_hdfs
> > > > > jira -> airflow_jira
> > > > > kerberos -> kerberos_security
> > > > > redis -> airflow_redis
> > > > > sendgrid -> airflow_sendgrid
> > > > > snowflake -> airflow_snowflake
> > > > > winrm -> airflow_winrm (operator)
> > > > > datadog -> airflow_datatog(sensor)
> > > > > sqlalchemy -> sqlalchemy_utils (utils)
> > > > >
> > > > >
> > > > >
> > > > > I can also change it to grpc_hooks.py, datadog_sensors.py etc. Or
> > > maybe someone knows an easy way how to keep the module name and
> implicit
> > > packages?
> > > > >
> > > > > I guess this is not a big problem to change it this way - we anyhow
> > > have changed import paths for those. The only two problems with this
> are:
> > > > > a) email was used in "email_backend" configuration : email_backend
> =
> > > >
> > >
> > > airflow.utils.email.send_email_smtp
> > > > > but we can solve it by handling that as special case, raising
> > > >
> > >
> > > deprecation warning and converting it to
> > > airflow.utils.email_utils.send_email_smtp
> > > > > b) we introduce slight inconsistency in hook/operator/sensor names.
> > > > >
> > > > > 2) Mypy fails when checking individual (duplicated) module names
> > > > > Mypy does not work well with duplicate module names in implicit
> > > packages (https://github.com/pre-commit/mirrors-mypy/issues/5) and
> with
> > > AIP-21 we have plenty of them. Repeating modules (http.py in both
> operators
> > > and hooks for example) cause it to fail in case they are provided as
> > > parameters (in case of incremental mypy check in pre-commits. I had to
> > > change it then so that mypy always check 'airflow' and 'test' packages
> > > instead - which makes pre-commit check slightly slower (few seconds).
> > > > >
> > > > > 3) _hooks.py/_operators.py/_sensors.py (
> > > http://hooks.py/_operators.py/_sensors.py) suffix: Some thought
> resulting
> > > from above 1) and 2):
> > > > >
> > > > > I did not want to open pandora's box again but I think removal of
> > > _hooks, _operators, _sensors from the module name might not have been
> the
> > > best decision. While it removes some duplication in module name, it
> > > actually introduces duplicate module names and (as it is with mypy) it
> > > might be a problem in the future. I think now we should change it back
> to
> > > add the suffixes. If we want to reverse it - this is the last moment
> we can
> > > do it (and we still can easily). I would love some quick
> opinion/voting for
> > > that - especially those who voted in AIP-21 but others are welcome as
> well:
> > > @Ash Berlin-Taylor (mailto:[email protected]) @Fokko Driesprong (mailto:
> > > [email protected]) @Felix Uellendall (mailto:[email protected]) @Kamil
> > > Breguła (mailto:[email protected]) @Kaxil Naik (mailto:
> > > [email protected]) @[email protected] (mailto:
> > > [email protected]) @Philippe Gagnon (mailto:
> > > [email protected]) - this was "Case 2" in the original AIP-21
> > > proposal.
> > > > >
> > > > > WDYT?
> > > > > 4) Test module names:
> > > > > I had to change all test module name for providers. For example: .
> > > Pytest does not like repeating implicit test module names. It throws
> error
> > > "duplicate test module). For example:
> > > > > tests/providers/http/hooks/test_http.py →
> > > >
> > >
> > > tests/providers/http/hooks/test_http_hooks.py (
> > >
> https://github.com/apache/airflow/pull/7279/files#diff-593e762ce1d79c250426a27b6fb28908
> > > )
> > > > >
> > > > >
> > > > > (basically I added hooks/operators/sensors everywhere). It is easy
> and
> > > fully automated and it has no impact at all on the main code base, so I
> > > think this is a super-safe change. t The nice thing is that it makes it
> > > easier to understand what kind of tests you are looking at immediately
> and.
> > > That also hints that maybe _hooks.py, _operators.py, _sensors.py in
> > > hooks/operators/sensors could have been a better choice (see 2 a) )
> > > > >
> > > > > 5) Doc generation
> > > > > Autoapi doc generation has not yet released implicit package
> support
> > > (it is in master only). However I managed to monkey-patch 1.2.1
> (latest)
> > > version to add support by cherry-picking the changes (there are
> literally
> > > few methods changed there -
> > > https://github.com/readthedocs/sphinx-autoapi/pull/178) and when new
> > > version is released we will be able to get rid of monkey-patching . I
> am
> > > still resolving a few import issues in that newest version but I am
> > > confident I can fix or workaround all issues with it.
> > > > > I don't expect this to be a problem - it is only in the doc
> generation
> > > >
> > >
> > > code, not the main application code.
> > > > >
> > > > > I verified that pytest test discovery works fine, and that airflow
> can
> > > be installed and works with implicit packages. Let me know what you
> think
> > > about it and I would love to merge it as soon as possible (rebasing
> this
> > > for a long time might be painful).
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Nov 23, 2019 at 9:22 AM Jarek Potiuk <
> [email protected]
> > > (mailto:[email protected])> wrote:
> > > > > > I've read a bit more about PEP-0420 so some more extracted
> > > > >
> > > >
> > >
> > > information:
> > > > > >
> > > > > > - implicit namespace packages are very similar to regular
> packages.
> > > The notable difference is that they can be loaded from several
> directories
> > > (so you can have modules in the same package but coming from different
> > > physical directories), so no __file__ is defined for the package (it is
> > > still defined for modules),
> > > > > >
> > > > > > - pytest should have no problems. It has indeed (
> > >
> https://stackoverflow.com/questions/50174130/how-do-i-pytest-a-project-using-pep-420-namespace-packages
> )
> > > problems with implicit packages when tests are co-located with their
> tested
> > > modules (module.py, module_test.py) but we are not in this situation.
> > > > > >
> > > > > > - I do not see how it could be slower - I have not found any
> > > benchmarking yet (but I have not found any complaints about it
> either). I
> > > looked at the loading algorithm for PEP-420 - it traverses all the
> > > directory structure available on the PYTHONPATH bo so does the
> "standard"
> > > mechanism. And we would not have to parse and load all the 140
> licence-only
> > > __init__.py. The presence of module in the package is based on the
> presence
> > > of "<module>.py" in the right folder. If we would have several packages
> > > elsewhere that might become a bit slower, but I think reading files is
> so
> > > much slower than scanning directories that - if anything - we will be
> > > faster (just opening and closing those __init__.files will take quite
> some
> > > time :). We can do some benchmarking before and after to be sure.
> > > > > >
> > > > > > Side comment: I am not 100% sure, but after reading PEP-420
> > > intuition tells me that maybe implicit packages can help in solving
> > > relative imports problems that some of our user raised . I saw recently
> > > several users had problems with relative imports used in DAGs. We are
> > > loading the DAGs in a specific way and in implicit packages, the
> __repr__
> > > of modules parent is dynamic - based on the location of the file rather
> > > than based on the module of the file, so it's likely if we encourage
> dag
> > > developers also to use implicit packages, it might actually solve the
> > > relative imports case.
> > > > > >
> > > > > > J.
> > > > > > On Fri, Nov 22, 2019 at 4:29 PM Daniel Imberman <
> > > > >
> > > >
> > >
> > > [email protected] (mailto:[email protected])> wrote:
> > > > > > > Fine by me if it doesn’t break anything. I’m always on the
> side of
> > > > > >
> > > > >
> > > >
> > >
> > > less code :).
> > > > > > >
> > > > > > > via Newton Mail [
> > >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.5&source=email_footer_2
> > > ]
> > > > > > > On Fri, Nov 22, 2019 at 8:12 AM, Ash Berlin-Taylor <
> [email protected]
> > > > > >
> > > > >
> > > >
> > >
> > > (mailto:[email protected])> wrote:
> > > > > > > Probably worth it, but things to check:
> > > > > > >
> > > > > > > - if nose or pytest needs them under tests/ to find all the
> tests
> > > > > > > - if not having the init files slows down pythons package
> importer
> > > > > >
> > > > >
> > > >
> > >
> > > as it searches more paths?
> > > > > > >
> > > > > > > -a
> > > > > > > > On 22 Nov 2019, at 12:06, Jarek Potiuk <
> [email protected]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > (mailto:[email protected])> wrote:
> > > > > > > >
> > > > > > > > There is one implementation detail in AIP-21 that I continue
> to
> > > have
> > > > > > > > questions about, and I wanted to ask community about it.
> > > > > > > >
> > > > > > > > Since we moved to Python 3, we have the option of using
> implicit
> > > namespace
> > > > > > > > packages.
> > > > > > > > https://www.python.org/dev/peps/pep-0420/ . We have now
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > grouping per folder
> > > > > > > > for services, so it would require a lot more __init__.py
> files
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > if we
> > > > > > > > continue using them.
> > > > > > > >
> > > > > > > > The implicit naming boils down to not requiring __init__.py
> > > files if no
> > > > > > > > initialisation of package is needed. We could do it
> consistently
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > for all
> > > > > > > > code which is "internal" to airflow. This means that most
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > packages will not
> > > > > > > > have __init__.py there will be few exceptions like 'airflow',
> > > > > > > > 'airflow.modules' and likely few others with the __init__.py.
> > > > > > > >
> > > > > > > > I did a quick check and we have only 25 _init_.py with some
> > > logic -
> > > > > > > > remaining ~ 140 could be removed as they only contain
> comments.
> > > > > > > >
> > > > > > > > find . -name '__init__.py' | xargs grep -le '^[^#].*' | wc -l
> > > > > > > > 25
> > > > > > > > find . -name '__init__.py' |wc -l
> > > > > > > > 164
> > > > > > > >
> > > > > > > > Those are the files that will be left:
> > > > > > > > ./tests/__init__.py
> > > > > > > > ./tests/utils/log/elasticmock/__init__.py
> > > > > > > > ./tests/utils/log/elasticmock/utilities/__init__.py
> > > > > > > > ./tests/models/__init__.py
> > > > > > > > ./tests/task/__init__.py
> > > > > > > > ./airflow/sensors/__init__.py
> > > > > > > > ./airflow/operators/__init__.py
> > > > > > > > ./airflow/_vendor/nvd3/__init__.py
> > > > > > > > ./airflow/_vendor/slugify/__init__.py
> > > > > > > > ./airflow/serialization/__init__.py
> > > > > > > > ./airflow/__init__.py
> > > > > > > > ./airflow/models/__init__.py
> > > > > > > >
> > >
> ./airflow/www/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py
> > > > > > > > ./airflow/ti_deps/deps/__init__.py
> > > > > > > > ./airflow/macros/__init__.py
> > > > > > > > ./airflow/executors/__init__.py
> > > > > > > > ./airflow/lineage/__init__.py
> > > > > > > > ./airflow/lineage/backend/__init__.py
> > > > > > > > ./airflow/lineage/backend/atlas/__init__.py
> > > > > > > > ./airflow/hooks/__init__.py
> > > > > > > > ./airflow/task/task_runner/__init__.py
> > > > > > > > ./airflow/api/__init__.py
> > > > > > > > ./airflow/api/common/experimental/__init__.py
> > > > > > > > ./airflow/api/client/__init__.py
> > > > > > > > ./airflow/jobs/__init__.py
> > > > > > > >
> > > > > > > > WDYT?
> > > > > > > > J
> > > > > > > > --
> > > > > > > > Jarek Potiuk
> > > > > > > > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > > > >
> > > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> > --
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to