[ 
https://issues.apache.org/jira/browse/AIRFLOW-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779362#comment-16779362
 ] 

jack commented on AIRFLOW-1847:
-------------------------------

[[email protected]] Just a thought... You can always have external code that 
listen to any event you would like and when caught you can trigger Airflow DAG 
with running command: 
{code:java}
airflow trigger_dag <DAG>...{code}
 

 

> Webhook Sensor
> --------------
>
>                 Key: AIRFLOW-1847
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1847
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: core, operators
>            Reporter: gsemet
>            Assignee: gsemet
>            Priority: Minor
>              Labels: api, sensors, webhook
>         Attachments: airflow-webhook-proposal.png
>
>
> h1. Webhook sensor
> May require a hook in the experimental API
> Register an api endpoint and wait for input on each.
> It is different than the {{dag_runs}} api in that the format is not airflow 
> specific, it is just a callback web url called by an external system on some 
> even with its application specific content. The content in really important 
> and need to be sent to the dag (as XCom?)
> Use Case:
> - A Dag registers a WebHook sensor named {{<webhookname>}}
> - An custom endpoint is exposed at 
> {{http://myairflow.server/api/experimental/webhook/<webhookname>}}.
> - I set this URL in the external system I wish to use the webhook from. Ex: 
> github/gitlab project webhook
> - when the external application performs a request to this URL, this is 
> automatically sent to the WebHook sensor. For simplicity, we can have a 
> JsonWebHookSensor that would be able to carry any kind of json content.
> - sensor only job would be normally to trigger the exection of a DAG, 
> providing it with the json content as xcom.
> If there are several requests at the same time, the system should be scalable 
> enough to not die or not slow down the webui. It is also possible to 
> instantiate an independant flask/gunicorn server to split the load. It would 
> mean it runs on another port, but this could be just an option in the 
> configuration file or even a complete independant application ({{airflow 
> webhookserver}}). I saw recent changes integrated gunicorn in airflow core, 
> guess it can help this use case.
> To support the charge, I think it is good that the part in the API just post 
> the received request in an internal queue so the Sensor can handle them later 
> without risk of missing one.
> Documentation would be updated to describe the classic scheme to implement 
> this use case, which would look like:
> !airflow-webhook-proposal.png!
> I think it is good to split it into 2 DAGs, one for linear handling of the 
> messages and triggering new DAG, and the processing DAG that might be 
> executed in parallel.  
> h2. Example usage in Sensor DAG: trigger a DAG on GitHub Push Event
> {code}
> sensor = JsonWebHookSensor(
>             task_id='my_task_id',
>             name="on_github_push"
>         )
> .. user is responsible to triggering the processing DAG himself.
> {code}
> In my github project, I register the following URL in webhook page:
> {code}
> http://airflow.myserver.com/api/experimental/webhook/on_github_push
> {code}
> From now on, on push, github will send a [json with this 
> format|https://developer.github.com/v3/activity/events/types/#pushevent] to 
> the previous URL.
> The {{JsonWebHookSensor}} receives the payload, and a new dag is triggered in 
> this Sensing Dag.
> h2. Documenation update
> - add new item in the [scheduling 
> documentation|https://pythonhosted.org/airflow/scheduler.html] about how to 
> trigger a DAG using a webhook
> - describe the sensing dag + processing dag scheme and provide the github use 
> case as real life example
> h2. Possible evolutions
> - use an external queue (redis, amqp) to handle lot of events
> - subscribe in a pub/sub system such as WAMP?
> - allow batch processing (trigger processing DAG on n events or after a 
> timeout, gathering n messages alltogether)
> - for higher throughput, kafka?
> - Security, authentication and other related subject might be adresses in 
> another ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to