I have a use case different from yours but with a similar requirement of creating tasks dynamically to reflect something external to Airflow. I think a lot of people do based on how much we talk about dynamic DAGs (feel free to search the list archive).
My system grabs a list of IDs from an external config via an API, but people commonly do this with JSON or YAML files as well. Then I loop through this config object to generate a dynamic DAG for each list item. I can't really comment on the SubDAG aspect of your design as I don't use them much. Another idea is to have one task that fetches the list of all IDs externally so that the DAG structure doesn't need to be dynamic. It might help if you explain your use case a bit more. Is it something like ChangeDetection <https://www.changedetection.com>? At a high level it kind of sounds like a cache on top of the Airflow API. *Taylor Edmiston* TEdmiston.com <https://www.tedmiston.com/> | Blog <http://blog.tedmiston.com> Stack Overflow CV <https://stackoverflow.com/story/taylor> | LinkedIn <https://www.linkedin.com/in/tedmiston/> | AngelList <https://angel.co/taylor> On Thu, Jan 18, 2018 at 4:18 PM, Swapnesh Chaubal <[email protected]> wrote: > Hey all, > > We are planning to use airflow for the following purpose. Could you guys > please let me know if this is a viable use-case or should we use some other > framework instead of Airflow? > > We are using the Airflow REST API to hit airflow with lets say an Id. > 1. The first task in the DAG will hit a service with that id. > 2. The next task will create a subdag which in itself has two tasks. > 3. A new instance of the subdag will be scheduled every-time the REST API > is hit with a different ID as a parameter and different start and end > times. > > The subdag calls a service to scrape some data. Thus the subdag is > essentially being used as a scheduler because we want something that can be > scrape data once everyday. > > Thank you for your help, > Swapnesh >
