Re: [PROPOSAL][AIP-32] Airflow REST API

Jarek Potiuk Sun, 01 Mar 2020 09:13:11 -0800

I have not yet reviewed the API in detail - and I am sure there will be
more things coming up while implementing it, but just one general comment
(to Ash and Matt concerns).


I think we need to agree on the "API first" vs. "Code First" approach (and
possibly vote if we have disagreement after discussion).

I  developed a number of APIs where API specs was generated from the code.
And that works great for development and iteration speed, but it usually
breaks down really quickly when you actually release the API for external
"API consumers". At this point you immediately lose all the flexibility
that comes with "code first" approach: you released the API already and you
are not supposed to change how the API is designed, yet the design is
by-product of the code so it is very difficult to make sure it stays
unchanged. You can still change the internal implementation but the
interface should be frozen anyway. It's much easier if your API definition
is your "source of truth" rather than code IMHO.

When you want to release a new version of the API - it should be released
as a bulk change "v2/v3..." (you do not release APIs frequently when your
unknown customers rely on it). And in the case of the "API first" approach
provided by connexion it is as easy as copying the specs, adding new/change
existing methods and map the existing code to those methods that have not
changed/add/modifycode where it actually changed. In the case of "code
first" approach you have to implement versioning approach yourself and
introduce your own approach - and usually you end-up in some ways of
freezing the OpenAPI spec and eventually you lose the flexibility of
code-gen and you do not have the validation and safety of the "API-First"
approach.

The "generate API from code" approach works well when both client and
server (for example backend and frontend of a web app) are developed and
released together. But when you have external consumers of the APIs - the
"API first" approach seems to work much better - and I believe in our case
we develop the API mostly for external consumers. As I see it - if you have
external, unknown consumers of your APIS - it is much easier to maintain in
the long run where the API spec is the "source of truth" rather than
"by-product".

I think we should all focus now on this question - should we go "API first"
or "Code first" - and once we agree on it - then we can dive into the
details.

Looking also at the blog post from Zalando team
<https://jobs.zalando.com/en/tech/blog/on-apis-and-the-zalando-api-guild/> and
Swaggers " about "Adopting an API first
<https://swagger.io/resources/articles/adopting-an-api-first-approach/>"
blog from Swagger - I am all-in for Connexion and API First (even I had
good code-first experience before).

J.


On Sat, Feb 29, 2020 at 2:02 AM Matt Buell <mattbuell...@gmail.com> wrote:

> Nice job Kamil, I really like where this is going - especially in regards
> to Variable and XCOM functionality.
>
> With regards to the Task Instance and XCOM endpoints, I was wondering what
> everyone's thoughts would be on using 'dag_run_id' as a unique identifier
> as opposed to the existing 'execution_date'? From a client side
> perspective, I think it would be easier to handle an ID versus a date. I'm
> basing this off of the below Jira issue that says dag_id and run_id are
> unique constraints (correct me if I'm wrong about this). I understand there
> would also be compatibility issues between this change and the experimental
> API.
>
> https://issues.apache.org/jira/browse/AIRFLOW-5593
>
>
> For the below methods, what about returning a list of IDs as opposed to the
> full models? You can return a higher number of records with a smaller
> amount data transfer.  The user could then get the full model with
> individual requests.
> GET /dags
> GET /dags/{dag_id}/tasks
>
>
> Personally, I've never used connexion before, however, it seems quite
> interesting to me. This yaml file would serve not only as a documentation
> spec, but also as API configuration? I see connexion provides response
> validation based off of the yaml configuration and ability to override the
> validator. However - my concerns are similar to Ash's, how easy would it be
> to maintain this yaml and maintain synchronization between docs and
> functionality?
>
>
> https://connexion.readthedocs.io/en/latest/response.html#response-validation
>
> Best,
> Matt
>
> On Fri, Feb 28, 2020 at 3:09 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > An easier URL to view the spec
> >
> https://editor.swagger.io/?url=https://raw.githubusercontent.com/PolideaInternal/airflow/1e3e444d03c2e08d8fcee274d9e745a3d3ddeca8/openapi.yaml
> >
> > On Feb 28 2020, at 4:01 pm, Ash Berlin-Taylor <a...@apache.org> wrote:
> > > To view the API spec <
> >
> https://github.com/PolideaInternal/airflow/blob/1e3e444d03c2e08d8fcee274d9e745a3d3ddeca8/openapi.yaml
> >
> > in an easier to consume form I used https://editor.swagger.io/ and
> > imported from the (raw) github URL
> > > Some comments on the proposed OpenAPI spec;
> > >
> > > connection id's are not unique, so get/delete might operate on multiple
> > entries. (This has caused some confusion in the past, but is probably a
> > separate issue to adding a REST API)
> > > dag_id is listed as "integer" type. it should be string. Same applies
> to
> > most other id types it appears.
> > >
> > > PATCH dag: fileloc shouldn't be writable (it's a security risk),
> > is_active and is_subdag are an internal field and shouldn't be writable
> > either.
> > >
> > > PATCH dag: - Did you think about/rule out having separate
> > /dag/{dag_id}/pause and unpause methods?
> > >
> > > I don't think taskReschedule should be exposed directly at all - it's
> an
> > internal detail for sensors in a specific state.
> > >
> > > PATCH variables - update_mask parameter: The way I've seen other APIs
> do
> > this is PATCH vs PUT requests. PATCH should only update the specified
> > fields, PUT is "here is the entire representation, make it look like
> this"
> > >
> > > PATCH variables - is_encrypted is exposed. It shouldn't be. It is not
> > user-setable (it has is not writable in the UI anymore)
> > >
> > > GET xComValues: This endpoint support reading resources across multiple
> > Task Instancce by specify a "-" as a !! - as a separator is not going to
> > work. That is a valid character in task ids.
> > >
> > > xComValues -- I feel the "values" is out of place/not needed.
> > >
> > >
> > >
> > > -ash
> > > (The thought of maintaining that swagger/openapi spec file is honestly
> a
> > bit daunting and making me lean more towards adding spec validation to
> FAB
> > than using connexion - specifically because it's a: huge and b: lives
> > separately to the code/endpoints)
> > > On Feb 28 2020, at 3:15 pm, Ash Berlin-Taylor <a...@apache.org> wrote:
> > > > Great work Kamil, I can't wait till we have a fully featured API in
> > Airflow!
> > > >
> > > > So AIP-13 has been voted on and "accepted". Do you have a proposal
> for
> > what we should do with that AIP that we've already voted upon?
> > > > In the permissions section you talk about creating, for example
> > "can_edit_variable" -- but on which view? Permissions in FAB are a on a
> > specific view -- see
> >
> https://github.com/dpgaspar/Flask-AppBuilder/blob/120bc3ca38c4559a0099fe3b1a1badb319b6546e/flask_appbuilder/security/sqla/models.py#L71-L78
> > > > "There are not many FAB REST API experts" -- there aren't that many
> > connexion experts either - has anyone involved with the project used
> > connexion before?. It's a wildly inaccurate metric, but
> > https://hugovk.github.io/top-pypi-packages/ lists FAB in position 980
> and
> > connexion in 1146, so they are both about as popular as each other.
> > > >
> > > > On Feb 26 2020, at 4:40 pm, Kamil Breguła <kamil.breg...@polidea.com
> >
> > wrote:
> > > > > Hello,
> > > > >
> > > > > I just created "AIP-32 - Airflow REST API" proposal and would love
> > > > > community feedback and comments.
> > > > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-32%3A+Airflow+REST+API
> > > > > I would love to know what is your expectation from the API.
> > > > >
> > > > > We currently have one experimental API, but despite its existence
> for
> > > > > 2 years, it has not reached a stable level. There are many critical
> > > > > aspects to this implementation including no defined schema, very
> > > > > narrow range of functions, unsafe default configuration, no
> HATEOAS,
> > > > > and many others.
> > > > >
> > > > > In the past, Drewstone began work on REST API based on OpenAPI.
> It's
> > > > > described by AIP-13: OpenAPI 3 based API definition. However, it
> was
> > > > > not completed due to the lack of interest from the author and the
> > > > > Kerberos configuration problem (It was at a time when Breeze was
> > still
> > > > > developing, so configuring all dependencies, including Kerberos,
> was
> > a
> > > > > problem). It had a narrow range of features that are expected by
> > users
> > > > > e.g. access to XCOM data and management for connection is missing,
> > > > >
> > > > > The Polidea and Google teams together with the community want to
> make
> > > > > another attempt based on our and community experience. Airflow
> > > > > deserves new stable solutions.
> > > > >
> > > > > Any comments and feedback/discussion in the AIP-32 document are
> > welcome!
> > > > > Best regards,
> > > > > Kamil Breguła
> > > > >
> > > >
> > >
> >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSAL][AIP-32] Airflow REST API

Reply via email to