One comment on all of this:

I'm not a fan of a _requiring_ separate api server component to be run - for 
smaller deployments (of which there are a lot!) this increases the memory 
requirements by a noticeable fraction.

-ash

On 19 January 2020 19:58:34 GMT+03:00, Jarek Potiuk <jarek.pot...@polidea.com> 
wrote:
>Some comments from my side:
>
>Bolke:
>
>Direct database access for users is a no go. Direct queries that in
>anyway
>> manipulate the database or any state in Airfl ow (trigger dags,
>connections
>> etc) from the CLI should be prevented.
>
>
>Absolutely agree in principle. But we have to discuss and accept
>profound
>consequences
>of such change. For me - this also means that those commands cannot be
>run
>on the machine where airflow is installed and CLI users should have no
>access to those
>machines - otherwise they can run arbitrary python method/class script
>using airflow and
>access the DB directly (it also means that there direct DB connection
>should not be allowed)
>This pretty much calls for either using scheduler as API server or
>introduce a new
>"API-server" entity.
>
>This is quite a departure from the current Airflow deployment model and
>I
>think it should be
>"very breaking change" (architecturally/deployment wise). It could
>still
>look like a local client for
>the user - so it's not a breaking change from user point of view.
>
>This also means that we do not need maybe 95% of airflow code and that
>we
>should either
>not use "airflow" package (airflow-cli maybe?), or do something with
>the
>"__init__.py" of
>the 'airflow' package. Currently we have a very bad import model if we
>want
>to use "airflow" package
>for everything. By doing 'import airflow.<whatever>" we are pulling in
>almost all airflow code
>because it is either directly or indirectly imported in the
>airflow/__init__.py. Including DB
>models. If we stick to having this __init__.py it means that we will
>continue to have many
>seconds of delays for starting up the CLI. I think it is very bad to
>have
>seconds delays.
>
>For me it really calls even for another package ("airflow-cli") that
>should
>be installable
>separately, Surely for local installation/development etc.  both could
>be
>installed alongside and look
>and behave as the current airflow does - and use localhost for
>communication (and even we
>should have the same backwards-compatible "airflow" binary). We could
>gradually migrate all
>the current CLI methods to switch to the new APIs.
>
>I fundamentally disagree with your position that remote working with
>> Airflow should be done through the Web UI.
>
>
>I agree. As I see it - all the modern tools for cloud world have all
>three
>modes for management -
>Web UI (mostly used for reading and discovery), CLI - mostly used for
>management and sometimes
>for querying, and REST-ish APIS (for whatever custom integration is
>needed).  Look at docker, gcloud
>and many others. From our survey - it's clear that this is also similar
>in
>our case. UI is used for monitoring
>accessing logs etc more frequently than for management, and CLI - while
>used less frequently in general
>has a strong usage as well.
>
>On your points:
>>
>> 1-3: Solved by using FAB
>>
>
>Indeed those three points might be solved by using FAB REST API. I am
>not
>100% sure if FAB is the best
>choice (although I think it is a clear contender). What I really like
>about
>it that we seem to have a way to
>use the same mechanism for authorisation/authentication as in UI. Using
>a
>standard approach here seems
>like a good idea, especially that out-of-the-box we can use all the
>"standard" authentication
>mechanisms:
>https://flask-appbuilder.readthedocs.io/en/latest/security.html# .
>(DB, OpenAI, LDAP, Oauth, custom) -
>they are all available out-of-the-box. What I also like is that FAB
>REST
>API has everything you expect from
>a modern API (
>https://flask-appbuilder.readthedocs.io/en/latest/rest_api.html#openapi-spec
>)
>It implements "open-api" standard, has built-in swagger integration and
>the
>capability of not only generating the
>documentation, having a swagger test server for API testing but also
>capability of automatically generating
>clients in different languages (I'd love however to see POC showing all
>that capabilities).
>IMHO any proposal for API implementation will have to have all those
>properties and
>FAB is hard to beat on that especially that we already use it..
>
>
>> 4. You main points are solved by using FAB. On B) I don’t think it is
>a
>> good idea to have an extra layer between json_client and
>local_client. Both
>> are pretty thin wrappers around `airflow.api.common.*`.
>>
>
>I think we should get rid of the local client asap and introduce only
>REST
>API and API server. It's inevitable
>IMHO and the sooner we introduce it - the better. This way we could
>immediately compress the layers there
>to 1 layer.
>
>
>> 5. Non existent ;-)
>>
>Off-by-one indeed :)
>
>
>> 6. I do agree that there is some code duplication going on, but its
>> intention was there in order to have it in place until `local_client`
>> wasn’t required anymore and the Rest API is mature. So mature the
>API, get
>> rid of local_client.
>>
>
>I think we should do it ASAP and get rid of the local client (but
>provide
>equivalent binary CLI to do
>the same that current CLI does). Once we remove the local
>implementation we
>should switch all the
>CLI methods to use the API and eventually decouple it from main airflow
>code once it is done.
>
>7. Having the CLI manipulate `airflow.cfg` is fine as long as it checks
>if
>> it has the right permissions to do so, the OS will enforce that
>anyway.
>>
>Agree.
>
>
>> 8. They are not in the same namespace. `airflow.api.client.*` vs
>> `airflow.www.api`. If you mean by ‘package’ ‘Airflow’, then I suggest
>maybe
>> packaging the API/Client separately of break up the Airflow package
>in
>> “airflow-common”, “airflow-client”, “airflow-server”.
>
>Same thought exactly ^^^
>
>
>> 9. That’s not unexpected: the interface is documented whilst not
>perfectly.
>> The CLI *should* use the public API. As in eat you own dog food. The
>(json)
>> client api resides in its own namespace which does *not* import
>Airflow at
>> all. So interfacing clients should not use the local_client.
>>
>
>Unfortunately this is not true - due to the way our __init_.py in
>airflow
>package is defined,
>pretty much everything in airflow has an implicit dependency on pretty
>much
>all the code. See above.
>Even if there are no "explicit" imports, just startup time is something
>that we cannot do anything about.
>
>In my opinion you are mixing several issues. The CLI has its issues,
>the
>> API has its issues and maybe our packaging has its issues. We should
>> address them separately.
>>
>
>I think Kamil is right here in mixing the issues. They are already
>mixed
>unfortunately. We need to decide now
>what is the target we want to achieve (separate api server? Separate
>client? Separate binaries? Separate python packages?
>Separate pypi packages?). Even if we do not implement it all and go
>through
>a transition period, we need to know where we
>are heading. Unfortunately we have this nasty airflow.__init__.py
>problem
>that I think severely hampers any decoupling effort
>we want to do - client/server and we need to decide first what we want
>to
>do and how to decouple (in the future) and define
>our roadmap how to get there and where to break backwards
>compatibility.
>
>
>
>Kamil:
>
>
>> I don't like the current remote mode in CLI for several reasons:
>> 1) The default API security configuration makes all API data public
>> 2) We do not provide sufficient security mechanisms to secure the
>API.
>> 3) The current API does not allow you to limit permissions so that
>the
>> user can only use the selected feature. If the user has access to the
>> API, he can do anything.
>>
>
>I think we should not be put-off by current API implementation, It was
>experimental after all. I think we have three options
>- implement completely new architecture/deployment approach
>- promote the current API to be "official" by adding the above
>- implement some kind of mixture with using the same technology -
>FAB but more of the features (open-api) and changing architecture to be
>more of a client-server separation (my preference).
>
>
>> 4) An abstraction leak between json_client and local_client.
>> a) The API Clients have no defined exceptions.
>> b) The API Clients have no defined return types
>>
>
>I think this can all be handled by adding it. And FAB's OpenApi +
>swagger
>specs is one of the more common/standard way of defining those, where
>we
>have documentation, generating clients in multiple languages etc.
>
>
>> 6) This forces duplicate code. This is dangerous because one method
>> may not be sufficiently tested and thus behave unexpectedly.
>>
>
>I agree we should not have duplicated code. We should remove local
>client.
>
>7) Not everything should be possible via the API. e.g. Tomek Urbaszek
>> is now working to add new commands that allow editing of the
>> airflow.cfg file.
>> PR: https://github.com/apache/airflow/pull/7130
>
>
>I think this depends on the general architecture we come up with. If we
>decide
>to come up with clear client/server split, the API for editing
>configuration
>will be the ONLY way to change it and it will be only available
>remotely.
>I think - similarly to Bolke - that in the "production" deployments,
>even the admins should not have access to the DB of Airflow, nor to
>configuration
>of Airflow. It should all be protected via API access mechanisms.
>
>
>> 8) We do not have sufficient separation of the API client and API
>> server. The API client and API server are in the same python package.
>> This seriously threatens the interoperability of this API. It's now
>> very easy to build an API that will return data in an unexpected
>> format.
>>
>
>Agree - what we have now is bad and can be easily mixed and abused.
>I think we should agree on the level of separation (python package/
>pypi package/ client-server deployment) we want to have and base all
>our
>further decisions on that.
>
>9) This means that people who want to use the API will only use the
>> specified client API. The same API client that is used by the CLI,
>> which is very unexpected because it has severe limitations.
>>
>
>Indeed. But I think we should remove those imitations.
>
>
>> I suggest that we do not develop remote mode in current CLI, and even
>> delete it in fear future.
>
>
>I think quite the opposite - we should turn all our interactions into
>remote ones.
>It might be that you want to achieve the same eventually. Maybe we talk
>about the same
>but look at it from a different perspective and different sequence of
>reaching the target.
>We need to talk a bit more about it to really understand whether we are
>not
>talking about
>the same long term goal - and then agree on what will be the technology
>to
>get us there
>and agree the way how to get there.
>
>
>> a) Build an API that meets the basic requirements:
>
>
>Yes. But I think it can be achieved by improving and extending features
>of
>current API. I think
>it is a matter of technology choice and what steps we are going to
>take.
>
>
>> b) Build an API client. It is best if the client is generated
>> automatically.
>>
>Yep. That's what FAB + OpenAPI + Swagger is very likely to deliver. We
>do
>not need to do it
>ourselves - this is a mostly solved problem.
>
>
>> c) We should build a CLI that provides access to the API in a secure
>> manner.
>>
>Agree. And again - this is what FAB + Open API seems to deliver pretty
>much
>out-of-the box.
>
>
>> Of course, we can try to build temporary solutions if the users need
>> it, but this should be very limited and should have a warning about
>> the level of stability and security. So if the user needs it can add
>> commands that support remote mode, but the user should be aware that
>> this is not a stable and secure solution.
>>
>
>I think it's already there -> "experimental" is a clear indication that
>this will likely
>change in the future.
>
>
>> I have plans for Q1 to build a stable API that does not have the
>above
>> issues.
>
>
>I think we need that fairly fast - for Airflow 2.0. And I am pretty
>convinced that if we use some existing
>framework we can implement it rather fast - providing that we will
>"stand
>on the shoulders of giants"
>and use what's already there. APIs in python programs are already
>solved
>problem. We do not need
>to build our own solution for that - just decide on architecture/split,
>plugin a framework and
>implement endpoints.
>
>
>J.
>
>-- 
>
>Jarek Potiuk
>Polidea <https://www.polidea.com/> | Principal Software Engineer
>
>M: +48 660 796 129 <+48660796129>
>[image: Polidea] <https://www.polidea.com/>

Reply via email to