Re: [Proposal] Creating DAG through the REST api

Jarek Potiuk Fri, 26 Aug 2022 18:37:03 -0700

Development experience of DAG authoring should for sure be improved (and
will be - this will be our focus in the coming months). But it makes
completely no sense IMHO to add DAG authoring experience to Airflow UI
when you have Pycharm, IntelliJ, VSCode. Vim, Github UI and plenty of other
tools that allow you to edit Python code WAY WAY WAY more efficiently than
any of those. It would make 0 sense for us to re-develop all those features
in airflow UI.


For development you'd do much better by starting `airflow standalone` and
editing DAG files that are locally available (that will allow you to
immediately run the DAGs when you change them locally) or even using
DebugExecutor:
https://airflow.apache.org/docs/apache-airflow/stable/executor/debug.html
to run the tasks (which does not need running Airflow at all) or using
`airflow tasks test` or `airflow dags test` - neither of which even need a
running airflow installation - just locally installed airflow package. All
of those are actually much better Python development environment

But if we ever go to some way of declarative approach for DAGs we might
likely consider being able to edit them via Airflow UI. It's much more
viable from both - security point of view and the fact that any declarative
approach we might want to use will be rather "airflow" or "workflow"
specific and there will likely be not nearly as many better ways of editing
them as we have currently for Python programs.

J.




On Thu, Aug 25, 2022 at 5:42 PM Nishant Sharma <[email protected]>
wrote:

> Hi,
> I have also felt the need at times for creating DAG's through the REST
> API. But I understand the security concerns associated with such
> implementation.
>
> If not submission through  REST API at least some sort of *development
> mode* in airflow interface to create and edit dag's for a user
> session. Not sure if this was brought up previously.
>
> Thanks,
> Nishant
>
> On Thu, Aug 25, 2022 at 6:32 AM Jarek Potiuk <[email protected]> wrote:
>
>> Just in case - please watch the devlist for the announcement of the "SIG
>> multitenancy" group if it slips my mind.
>>
>> On Thu, Aug 25, 2022 at 1:31 PM Jarek Potiuk <[email protected]> wrote:
>>
>>> Cool. I will make sure to include you ! I think this is something that
>>> will happen in September, The holiday period is not the best to organize it.
>>>
>>> On Thu, Aug 25, 2022 at 5:50 AM Mocheng Guo <[email protected]> wrote:
>>>
>>>> My use case needs automation and security: those are the two key
>>>> requirements and does not have to be REST API if there is another way that
>>>> DAGs could be submitted to a cloud storage securely. Sure I would
>>>> appreciate it if you could include me when organizing AIP-1 related
>>>> meetings. Kerberos is a ticket based system in which a ticket has a limited
>>>> lifetime. Using kerberos, a workload could be authenticated before
>>>> persistence so that Airflow uses its kerberos keytab to execute, which is
>>>> similar to the current implementation in worker, another possible scenarios
>>>> is a persisted workload needs to include a kerberos renewable TGT to be
>>>> used by Airflow worker, but this is more complex and I would be happy to
>>>> discuss more in meetings. I will draft a more detailed document for review.
>>>>
>>>> thanks
>>>> Mocheng
>>>>
>>>>
>>>> On Thu, Aug 18, 2022 at 1:19 AM Jarek Potiuk <[email protected]> wrote:
>>>>
>>>>> None of those requirements are supported by Airflow. And opening REST
>>>>> API does not solve the authentication use case you mentioned.
>>>>>
>>>>> This is a completely new requirement you have - basically what you
>>>>> want is workflow identity and it should be rather independent from the way
>>>>> DAG is submitted. It would require to attach some kind of identity and
>>>>> signaturea and some way of making sure that the DAG has not been tampered
>>>>> with, in a way that the worker could use the identity when executing the
>>>>> workload and be sure that no-one else modified the DAG - including any of
>>>>> the files that the DAG uses. This is an interesting case but it has 
>>>>> nothing
>>>>> to do with using or not the REST API. REST API alone will not give you the
>>>>> user identity guarantees that you need here. The distributed nature of
>>>>> Airflow basically requires such workflow identity has to be provided by
>>>>> cryptographic signatures and verifying the integrity of the DAG rather 
>>>>> than
>>>>> basing it on REST API authentication.
>>>>>
>>>>> BTW. We do support already Kerberos authentication for some of our
>>>>> operators but identity is necessarily per instance executing the workload 
>>>>> -
>>>>> not the user submitting the DAG.
>>>>>
>>>>> This could be one of the improvement proposals that could in the
>>>>> future become a sub-AIP or  AIP-1 (Improve Airflow Security). if you are
>>>>> interested in leading and proposing such an AIP i will be soon (a month or
>>>>> so) re-establishing #sig-multitenancy meetings (see AIP-1 for recordings
>>>>> and minutes of previous meetings). We already have AiP-43 and AIP-44
>>>>> approved there (and AIP-43 close to completion) and the next steps should
>>>>> be introducing fine graines security layer to executing the workloads.
>>>>> Adding workload identity might be part of it. If you would like to work on
>>>>> that - you are most welcome. It means to prepare and discuss proposals, 
>>>>> get
>>>>> consensus of involved parties, leading it to a vote and finally
>>>>> implementing it.
>>>>>
>>>>> J
>>>>>
>>>>> czw., 18 sie 2022, 02:44 użytkownik Mocheng Guo <[email protected]>
>>>>> napisał:
>>>>>
>>>>>> >> Could you please elaborate why this would be a problem to use
>>>>>> those (really good for file pushing) APIs ?
>>>>>>
>>>>>> Submitting DAGs directly to cloud storage API does help for some part
>>>>>> of the use case requirement, but cloud storage does not provide the
>>>>>> security a data warehouse needs. A typical auth model supported in data
>>>>>> warehouse is Kerberos, and a data warehouse provides limited view to a
>>>>>> kerberos user with authorization rules. We need users to submit DAGs with
>>>>>> identities supported by the data warehouse, so that Apache Spark jobs 
>>>>>> will
>>>>>> be executed as the kerberos user who submits a DAG which in turns decide
>>>>>> what data can be processed, there may also be need to handle 
>>>>>> impersonation,
>>>>>> so there needs to be an additional layer to handle data warehouse auth 
>>>>>> e.g.
>>>>>> kerberos.
>>>>>>
>>>>>> Assuming dags are already inside the cloud storage, and I think
>>>>>> AIP-5/20 would work better than the current mono repo model if it could
>>>>>> support better flexibility and less latency, and I would be very 
>>>>>> interested
>>>>>> to be part of the design and implementation.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 12, 2022 at 10:56 AM Jarek Potiuk <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> First appreciate all for your valuable feedback. Airflow by design
>>>>>>> has to accept code, both Tomasz and Constance's examples let me think 
>>>>>>> that
>>>>>>> the security judgement should be on the actual DAGs rather than how DAGs
>>>>>>> are accepted or a process itself. To expand a little bit more on another
>>>>>>> example, say another service provides an API which can be invoked by its
>>>>>>> clients the service validates user inputs e.g. SQL and generates Airflow
>>>>>>> DAGs which use the validated operators/macros. Those DAGs are safe to be
>>>>>>> pushed through the API. There are certainly cases that DAGs may not be
>>>>>>> safe, e.g the API service on public cloud with shared tenants with no
>>>>>>> knowledge how DAGs are generated, in such cases the API service can 
>>>>>>> access
>>>>>>> control the identity or even reject all calls when considered unsafe.
>>>>>>> Please let me know if the example makes sense, and if there is a common
>>>>>>> interest, having an Airflow native write path would benefit the 
>>>>>>> community
>>>>>>> instead of each building its own solution.
>>>>>>>
>>>>>>> > You seem to repeat more of the same. This is exactly what we want
>>>>>>> to avoid. IF you can push a code over API you can push Any Code. And
>>>>>>> precisely the "Access Control" you mentioned or rejecting the call when
>>>>>>> "considering code unsafe" those are the decisions we already 
>>>>>>> deliberately
>>>>>>> decided we do not want Airflow REST API to make. Whether the code it's
>>>>>>> generated or not does not matter because Airflow has no idea whatsoever 
>>>>>>> if
>>>>>>> it has been manipulated with, between the time it was generated and 
>>>>>>> pushed.
>>>>>>> The only way Airflow can know that the code is not manipulated with is 
>>>>>>> when
>>>>>>> it generates DAG code on its own based on a declarative input. The 
>>>>>>> limit is
>>>>>>> to push declarative information only. You CANNOT push code via the REST
>>>>>>> API. This is out of the question. The case is closed.
>>>>>>>
>>>>>>> The middle loop usually happens on a Jupyter notebook, it needs to
>>>>>>> change data/features used by model frequently which in turn leads to
>>>>>>> Airflow DAG updates, do you mind elaborate how to automate the changes
>>>>>>> inside a notebook and programmatically submitting DAGs through git+CI/CD
>>>>>>> while giving user quick feedback? I understand git+ci/cd is technically
>>>>>>> possible but the overhead involved is a major reason users rejecting
>>>>>>> Airflow for other alternative solutions, e.g. git repo requires manual
>>>>>>> approval even if DAGs can be programmatically submitted, and CI/CD are 
>>>>>>> slow
>>>>>>> offline processes with large repo.
>>>>>>>
>>>>>>> Case 2 is actually (if you attempt to read my article I posted
>>>>>>> above, it's written there) the case where shared volume could still be 
>>>>>>> used
>>>>>>> and are bette. This why it's great that Airflow supports multiple DAG
>>>>>>> syncing solutions because your "middle" environment does not have to 
>>>>>>> have
>>>>>>> git sync as it is not "production' (unless you want to mix development 
>>>>>>> with
>>>>>>> testing that is, which is terrible, terrible idea).
>>>>>>>
>>>>>>> Your data science for middle ground does:
>>>>>>>
>>>>>>> a) cp my_dag.py "/my_midle_volume_shared_and_mounted_locally". - if
>>>>>>> you use shared volume of some sort (NFS/EFS etc.)
>>>>>>> b) aws s3 cp my_dag.py "s3://my-midle-testing-bucket/" - if your
>>>>>>> dags are on S3  and synced using s3-sync
>>>>>>> c) gsutil cp my_dag.py "gs://my-bucket" - if your dags are on GCS
>>>>>>> and synced using s3-sync
>>>>>>>
>>>>>>> Those are excellent "File push" apis. They do the job. I cannot
>>>>>>> imagine why the middle-loop person might have a problem with using them.
>>>>>>> All of that can also be  fully automated -  they all have nice Python 
>>>>>>> and
>>>>>>> other language APIs so you can even make the IDE run those commands
>>>>>>> automatically on every save if you want.
>>>>>>>
>>>>>>> Could you please elaborate why this would be a problem to use those
>>>>>>> (really good for file pushing) APIs ?
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 12, 2022 at 6:20 PM Mocheng Guo <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> First appreciate all for your valuable feedback. Airflow by design
>>>>>>>> has to accept code, both Tomasz and Constance's examples let me think 
>>>>>>>> that
>>>>>>>> the security judgement should be on the actual DAGs rather than how 
>>>>>>>> DAGs
>>>>>>>> are accepted or a process itself. To expand a little bit more on 
>>>>>>>> another
>>>>>>>> example, say another service provides an API which can be invoked by 
>>>>>>>> its
>>>>>>>> clients the service validates user inputs e.g. SQL and generates 
>>>>>>>> Airflow
>>>>>>>> DAGs which use the validated operators/macros. Those DAGs are safe to 
>>>>>>>> be
>>>>>>>> pushed through the API. There are certainly cases that DAGs may not be
>>>>>>>> safe, e.g the API service on public cloud with shared tenants with no
>>>>>>>> knowledge how DAGs are generated, in such cases the API service can 
>>>>>>>> access
>>>>>>>> control the identity or even reject all calls when considered unsafe.
>>>>>>>> Please let me know if the example makes sense, and if there is a common
>>>>>>>> interest, having an Airflow native write path would benefit the 
>>>>>>>> community
>>>>>>>> instead of each building its own solution.
>>>>>>>>
>>>>>>>> Hi Xiaodong/Jarek, for your suggestion let me elaborate on a use
>>>>>>>> case, here are three loops a data scientist is doing to develop a 
>>>>>>>> machine
>>>>>>>> learning model:
>>>>>>>> - inner loop: iterates on the model locally.
>>>>>>>> - middle loop: iterate the model on a remote cluster with
>>>>>>>> production data, say it's using Airflow DAGs behind the scenes.
>>>>>>>> - outer loop: done with iteration and publish the model on
>>>>>>>> production.
>>>>>>>> The middle loop usually happens on a Jupyter notebook, it needs to
>>>>>>>> change data/features used by model frequently which in turn leads to
>>>>>>>> Airflow DAG updates, do you mind elaborate how to automate the changes
>>>>>>>> inside a notebook and programmatically submitting DAGs through 
>>>>>>>> git+CI/CD
>>>>>>>> while giving user quick feedback? I understand git+ci/cd is technically
>>>>>>>> possible but the overhead involved is a major reason users rejecting
>>>>>>>> Airflow for other alternative solutions, e.g. git repo requires manual
>>>>>>>> approval even if DAGs can be programmatically submitted, and CI/CD are 
>>>>>>>> slow
>>>>>>>> offline processes with large repo.
>>>>>>>>
>>>>>>>> Such use case is pretty common for data scientists, and a better
>>>>>>>> **online** service model would help open up more possibilities for 
>>>>>>>> Airflow
>>>>>>>> and its users, as additional layers providing more values(like 
>>>>>>>> Constance
>>>>>>>> mentioned enable users with no engineering or airflow domain knowledge 
>>>>>>>> to
>>>>>>>> use Airflow) could be built on top of Airflow which remains as a lower
>>>>>>>> level orchestration engine.
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> Mocheng
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 11, 2022 at 10:46 PM Jarek Potiuk <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I really like the Idea of Tomek.
>>>>>>>>>
>>>>>>>>> If we ever go (which is not unlikely) - some "standard"
>>>>>>>>> declarative way of describing DAGs, all my security, packaging 
>>>>>>>>> concerns are
>>>>>>>>> gone - and submitting such declarative DAG via API is quite viable. 
>>>>>>>>> Simply
>>>>>>>>> submitting a Python code this way is a no-go for me :). Such 
>>>>>>>>> Declarative
>>>>>>>>> DAG could be just stored in the DB and scheduled and executed using 
>>>>>>>>> only
>>>>>>>>> "declaration" from the DB - without ever touching the DAG "folder" and
>>>>>>>>> without allowing the user to submit any executable code this way. All 
>>>>>>>>> the
>>>>>>>>> code to execute would already have to be in Airflow already in this 
>>>>>>>>> case.
>>>>>>>>>
>>>>>>>>> And I very much agree also that this case can be solved with Git.
>>>>>>>>> I think we are generally undervaluing the role Git plays for DAG
>>>>>>>>> distribution of Airflow.
>>>>>>>>>
>>>>>>>>> I think when the user feels the need (I very much understand the
>>>>>>>>> need Constance) to submit the DAG via API,  rather than adding the 
>>>>>>>>> option
>>>>>>>>> of submitting the DAG code via "Airflow REST API", we should simply 
>>>>>>>>> answer
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> *Use Git and git sync. Then "Git Push" then becomes the standard
>>>>>>>>> "API" you wanted to push the code.*
>>>>>>>>>
>>>>>>>>> This has all the flexibility you need, it has integration with
>>>>>>>>> Pull Request, CI workflows, keeps history etc.etc. When we tell 
>>>>>>>>> people "Use
>>>>>>>>> Git" - we have ALL of that and more for free. Standing on the 
>>>>>>>>> shoulders of
>>>>>>>>> giants.
>>>>>>>>> If we start thinking about integration of code push via our own
>>>>>>>>> API - we basically start the journey of rewriting Git as eventually 
>>>>>>>>> we will
>>>>>>>>> have to support those cases. This makes absolutely no sense for me.
>>>>>>>>>
>>>>>>>>> I even start to think that we should make "git sync" a separate
>>>>>>>>> (and much more viable) option that is pretty much the "main 
>>>>>>>>> recommendation"
>>>>>>>>> for Airflow. rather than "yet another option among shared folders and 
>>>>>>>>> baked
>>>>>>>>> in DAGs" case.
>>>>>>>>>
>>>>>>>>> I recently even wrote my thoughts about it in this post: "Shared
>>>>>>>>> Volumes in Airflow - the good, the bad and the ugly":
>>>>>>>>> https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca
>>>>>>>>> which has much more details on why I think so.
>>>>>>>>>
>>>>>>>>> J.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 11, 2022 at 8:43 PM Constance Martineau
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I understand the security concerns, and generally agree, but as a
>>>>>>>>>> regular user I always wished we could upload DAG files via an API. 
>>>>>>>>>> It opens
>>>>>>>>>> the door to have an upload button, which would be nice. It would make
>>>>>>>>>> Airflow a lot more accessible to non-engineering types.
>>>>>>>>>>
>>>>>>>>>> I love the idea of implementing a manual review option in
>>>>>>>>>> conjunction with some sort of hook (similar to Airflow cluster 
>>>>>>>>>> policies)
>>>>>>>>>> would be a good middle ground. An administrator could use that hook 
>>>>>>>>>> to do
>>>>>>>>>> checks against DAGs or run security scanners, and decide whether or 
>>>>>>>>>> not to
>>>>>>>>>> implement a review requirement.
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 11, 2022 at 1:54 PM Tomasz Urbaszek <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> In general I second what XD said. CI/CD feels better than
>>>>>>>>>>> sending DAG files over API and the security issues arising from 
>>>>>>>>>>> accepting
>>>>>>>>>>> "any python file" are probably quite big.
>>>>>>>>>>>
>>>>>>>>>>> However, I think this proposal can be tightly related to
>>>>>>>>>>> "declarative DAGs". Instead of sending a DAG file, the user would 
>>>>>>>>>>> send the
>>>>>>>>>>> DAG definition (operators, inputs, relations) in a predefined format
>>>>>>>>>>> that is not a code. This of course has some limitations like 
>>>>>>>>>>> inability to
>>>>>>>>>>> define custom macros, callbacks on the fly but it may be a good 
>>>>>>>>>>> compromise.
>>>>>>>>>>>
>>>>>>>>>>> Other thought - if we implement something like "DAG via API"
>>>>>>>>>>> then we should consider adding an option to review DAGs (approval 
>>>>>>>>>>> queue
>>>>>>>>>>> etc) to reduce security issues that are mitigated by for example 
>>>>>>>>>>> deploying
>>>>>>>>>>> DAGs from git (where we have code review, security scanners etc).
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Tomek
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 11 Aug 2022 at 17:50, Xiaodong Deng <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Mocheng,
>>>>>>>>>>>>
>>>>>>>>>>>> Please allow me to share a question first: so in your proposal,
>>>>>>>>>>>> the API in your plan is still accepting an Airflow DAG as the 
>>>>>>>>>>>> payload (just
>>>>>>>>>>>> binarized or compressed), right?
>>>>>>>>>>>>
>>>>>>>>>>>> If that's the case, I may not be fully convinced: the
>>>>>>>>>>>> objectives in your proposal is about automation & programmatically
>>>>>>>>>>>> submitting DAGs. These can already be achieved in an efficient way 
>>>>>>>>>>>> through
>>>>>>>>>>>> CI/CD practice + a centralized place to manage your DAGs (e.g. a 
>>>>>>>>>>>> Git Repo
>>>>>>>>>>>> to host the DAG files).
>>>>>>>>>>>>
>>>>>>>>>>>> As you are already aware, allowing this via API adds additional
>>>>>>>>>>>> security concern, and I would doubt if that "breaks even".
>>>>>>>>>>>>
>>>>>>>>>>>> Kindly let me know if I have missed anything or misunderstood
>>>>>>>>>>>> your proposal. Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> XD
>>>>>>>>>>>> ----------------------------------------------------------------
>>>>>>>>>>>> (This is not a contribution)
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Aug 10, 2022 at 1:46 AM Mocheng Guo <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have an enhancement proposal for the REST API service. This
>>>>>>>>>>>>> is based on the observations that Airflow users want to be able 
>>>>>>>>>>>>> to access
>>>>>>>>>>>>> Airflow more easily as a platform service.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The motivation comes from the following use cases:
>>>>>>>>>>>>> 1. Users like data scientists want to iterate over data
>>>>>>>>>>>>> quickly with interactive feedback in minutes, e.g. managing data 
>>>>>>>>>>>>> pipelines
>>>>>>>>>>>>> inside Jupyter Notebook while executing them in a remote airflow 
>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>> 2. Services targeting specific audiences can generate DAGs
>>>>>>>>>>>>> based on inputs like user command or external triggers, and they 
>>>>>>>>>>>>> want to be
>>>>>>>>>>>>> able to submit DAGs programmatically without manual intervention.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe such use cases would help promote Airflow usability
>>>>>>>>>>>>> and gain more customer popularity. The existing DAG repo brings
>>>>>>>>>>>>> considerable overhead for such scenarios, a shared repo requires 
>>>>>>>>>>>>> offline
>>>>>>>>>>>>> processes and can be slow to rollout.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The proposal aims to provide an alternative where a DAG can be
>>>>>>>>>>>>> transmitted online and here are some key points:
>>>>>>>>>>>>> 1. A DAG is packaged individually so that it can be
>>>>>>>>>>>>> distributable over the network. For example, a DAG may be a 
>>>>>>>>>>>>> serialized
>>>>>>>>>>>>> binary or a zip file.
>>>>>>>>>>>>> 2. The Airflow REST API is the ideal place to talk with the
>>>>>>>>>>>>> external world. The API would provide a generic interface to 
>>>>>>>>>>>>> accept DAG
>>>>>>>>>>>>> artifacts and should be extensible to support different artifact 
>>>>>>>>>>>>> formats if
>>>>>>>>>>>>> needed.
>>>>>>>>>>>>> 3. DAG persistence needs to be implemented since they are not
>>>>>>>>>>>>> part of the DAG repository.
>>>>>>>>>>>>> 4. Same behavior for DAGs supported in API vs those defined in
>>>>>>>>>>>>> the repo, i.e. users write DAGs in the same syntax, and its 
>>>>>>>>>>>>> scheduling,
>>>>>>>>>>>>> execution, and web server UI should behave the same way.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since DAGs are written as code, running arbitrary code inside
>>>>>>>>>>>>> Airflow may pose high security risks. Here are a few proposals to 
>>>>>>>>>>>>> stop the
>>>>>>>>>>>>> security breach:
>>>>>>>>>>>>> 1. Accept DAGs only from trusted parties. Airflow already
>>>>>>>>>>>>> supports pluggable authentication modules where strong 
>>>>>>>>>>>>> authentication such
>>>>>>>>>>>>> as Kerberos can be used.
>>>>>>>>>>>>> 2. Execute DAG code as the API identity, i.e. A DAG created
>>>>>>>>>>>>> through the API service will have run_as_user set to be the API 
>>>>>>>>>>>>> identity.
>>>>>>>>>>>>> 3. To enforce data access control on DAGs, the API identity
>>>>>>>>>>>>> should also be used to access the data warehouse.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We shared a demo based on a prototype implementation in the
>>>>>>>>>>>>> summit and some details are described in this ppt
>>>>>>>>>>>>> <https://drive.google.com/file/d/1luDGvWRA-hwn2NjPoobis2SL4_UNYfcM/view>,
>>>>>>>>>>>>> and would love to get feedback and comments from the community 
>>>>>>>>>>>>> about this
>>>>>>>>>>>>> initiative.
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks
>>>>>>>>>>>>> Mocheng
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Constance Martineau
>>>>>>>>>> Product Manager
>>>>>>>>>>
>>>>>>>>>> Email: [email protected]
>>>>>>>>>> Time zone: US Eastern (EST UTC-5 / EDT UTC-4)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <https://www.astronomer.io/>
>>>>>>>>>>
>>>>>>>>>>

Re: [Proposal] Creating DAG through the REST api

Reply via email to