Absolutely yes. I think this is really what is in the current plans/roadmap
:).

It's just a matter of when and how to enforce it. The current experimental
API is well .... still experimental.

What we really need to do is implement much more complete API
support/approach - which is on the Airllow 2.0
roadmap. AFAIK - Kamil is going to start discussions and make some
proposals for that this week.
And decoupling CLI from DB is rather high on the list for the API I believe.

I think it's really important to solve it "well" - i.e. introduce flexible
authorisation/authentication mechanism to the api,
have a way to decouple both client and web server from database operations
and eventually decouple
workers from DB access. This is our ultimate goal and I think we should
define a broader picture/target now
(i.e. how to design the API so that it serves all that options in the
future) and have a plan on how to gradually
introduce it so that several contributors/commiters can take part in this
process. One of the sequences
of introducing the API might be for example:

1) either graduate existing experimental API to be "official" or introduce
a new "official"
    API solution if we find it better
2) reimplement all CLI commands to use the new API
3) Reimplement web server to use the new API.
4) (very long term) decouple workers from the database.

I think forbidding CLI to access database should happen between 1) and 2) -
when we have an "official" API solution
in place (and it should be automatically verified - we can easily add
pylint plugin that can check CLI package for
db usage). In my opinion we cannot expect people to use API until it goes
out of experimental/ we have a viable
stable long term alternative agreed.

Once this is in place - no new CLI command should be allowed with the
direct DB access.

J.


On Sat, Jan 18, 2020 at 12:59 PM Bolke de Bruin <bdbr...@gmail.com> wrote:

> Hi All,
>
> I’ve noticed that we are still implementing new features or are doing
> refactoring of CLI commands that directly interface with the database
> instead of using the abstractions that should be made available from the
> API specification. Why is this an issue? The CLI is used by arbitrary user
> to interface with Airflow operations. Airflow relies on the database to be
> its single source of truth. A user that is able to read the configuration
> of Airflow is currently able to manipulate the database. The CLI requires
> database access hence the information to deal with the database is in the
> configuration file. To improve security the CLI should use a rest API which
> allows for proper authn/authz and segregation of duties.
>
> In the past I have introduced the experimental API with a “local_client”
> and “json_client” implementation. The local_client still allows for direct
> database access and its only function is to be there during the transition
> period to have the full rest api available. After that it should be
> deprecated and removed.
>
> My suggestion is to disallow any new functionality in the CLI that directly
> relies on “airflow.utils.session” and only allow new functionality to go
> through the API client. For now that would mean 2 implementations: local
> and json. Of course refactoring the current state should be on the list in
> order to remove the “local_client”.
>
> The API client should be available to other packages as well. So maybe we
> should package the cli and client api implementations into
> “airflow-client”.
>
> What are your thoughts?
>
> Thanks
> Bolke
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to