This is a really fascinating idea. REST API as plugin. Will have to think about how this fits in with security, but intriguing nonetheless.
On Tue, Jul 12, 2016 at 12:04 PM, Cade Markegard <[email protected]> wrote: > I've been playing around with creating a HTTP API using Airflow's plugins > here's a little bit of the triggering the DagRun: > > https://gist.github.com/cademarkegard/e1adc20baf6fbae89bac2dcca3d2159e > > Hopefully that helps clear up how you could pass parameters to the DagRun. > You'd probably also want to add some token based auth for the route. > > Cade > > On Tue, Jul 12, 2016 at 11:34 AM Paul Minton <[email protected]> wrote: > >> > >> > For the use case where the parameters only change parameters or the >> > behavior of tasks (but not the shape of the DAG itself) >> >> >> This is the use case that I'm thinking of. But it's not clear how to change >> those parameters from the UI or REST api (if that's at all possible). >> >> On Tue, Jul 12, 2016 at 10:48 AM, Maxime Beauchemin < >> [email protected]> wrote: >> >> > Hi, >> > >> > A few notes around dynamic DAGs: >> > >> > We don't really support mutating a DAG's shape or structure based on >> source >> > parameters. Think about it: it would be hard to fit the current paradigm >> of >> > the UI like representing that DAG in the tree view. We like to think of >> > DAGs as pretty static or "slowly changing", similar to how a database >> > physical model evolves in the lifecycle of an application or a data >> > warehouse (at a similar rhythm). For those use cases (where input >> > parameters would change the shape of the DAG), we think of those as >> > different "singleton" DAGs that are expected to run a single time. To get >> > this to work, we create a "DAG factory" as a python scripts that outputs >> > many different DAG objects (with different dag_ids) and where >> > `schedule_interval='@once'` based on a config file or something >> equivalent >> > (db configuration, airflow.models.Variable object, ...). >> > >> > For the use case where the parameters only change parameters or the >> > behavior of tasks (but not the shape of the DAG itself), I recommend >> using >> > a DAG where `schedule_interval=None` that is triggered with different >> > parameters for its conf. Inside templates or operators you can access the >> > context easily to refer to the related DagRun's conf parameters. You >> could >> > potentially do that with a DAG on a schedule using Xcom as well, where an >> > early task would populate some Xcom parameters that following tasks would >> > read. >> > >> > Max >> > >> > On Mon, Jul 11, 2016 at 6:26 PM, Paul Minton <[email protected]> wrote: >> > >> > > I asked a very similar question in this thread that might provide a >> > > solution in the form of --conf option in trigger_dag. >> > > >> > > >> > > >> > >> http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201607.mbox/browser >> > > >> > > However my last comment on the thread suggests exposing similar >> > > functionality to the REST api and the UI. >> > > >> > > On Mon, Jul 11, 2016 at 3:05 PM, Lance Norskog < >> [email protected]> >> > > wrote: >> > > >> > > > XCOM is a data store for passing data to&between tasks. This is how >> you >> > > > would pass dynamic data to the starting task of a DAG. >> > > > Is there a CLI command to add data to XCOM? >> > > > >> > > > On Mon, Jul 11, 2016 at 2:46 PM, Jon McKenzie <[email protected]> >> > wrote: >> > > > >> > > > > Unless I'm missing it, it appears like it isn't possible to launch >> a >> > > DAG >> > > > > job with initial inputs to the first task instance in the workflow >> > > > (without >> > > > > specifying those inputs in the DAG definition) >> > > > > >> > > > > Am I missing something? >> > > > > >> > > > > So for instance, I want to have user A be able to launch the DAG >> with >> > > > > parameter foo = bar, and user B to be able to launch the same DAG >> > with >> > > > foo >> > > > > = baz. In my use case, this would be hooked up to a RESTful API, >> and >> > > the >> > > > > users wouldn't necessarily know anything about DAGs or what's >> > happening >> > > > > behind the scenes >> > > > > >> > > > > The closest I can think to accomplishing this is to generate run >> IDs >> > in >> > > > my >> > > > > REST API, store the (run ID, input) pair in a database, and >> retrieve >> > > the >> > > > > inputs in my first task in my DAG. But this seems like a very >> > > hamhanded, >> > > > > roundabout way of doing it. I'd much rather just create a DagRun >> with >> > > > > task_params that the scheduler automatically associates to the >> first >> > > task >> > > > > instance. >> > > > > >> > > > > Any thoughts? >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Lance Norskog >> > > > [email protected] >> > > > Redwood City, CA >> > > > >> > > >> > >>
