Javier Domingo Cansino created AIRFLOW-2480:
-----------------------------------------------
Summary: DAGs per dataset instead of per date
Key: AIRFLOW-2480
URL: https://issues.apache.org/jira/browse/AIRFLOW-2480
Project: Apache Airflow
Issue Type: New Feature
Reporter: Javier Domingo Cansino
Currently airflow runs on a date basis. All the scheduling and running logic
runs on thinking that ETLs depend on the date they are run. However, there are
another set of usecases where it's not the date what varies, but the dataset
itself.
One example application is when treating genomic data. This data doesn't
change, but the usecase is to run all DAGs you may have on samples, rather than
dates. This can also be applied to when one has services that rely on
diagnosing datasets.
For now, one way to solve this is by creating a DAG per user, scheduling it
with None, and triggering it manually from the UI/cli, however it has the
drawback that there is only one column in the dates, as new datasets will just
create new DAGs.
Of course, backfill processes would be applied to run an specific DAG on all
the samples, rather than just an specific one.
There are a few questions I would like to ask:
* How accoplated is the current design of the scheduler/executors in airflow
to dates?
* Is this a contribution someone would be interested in (besides me)?
* Is there any work in progress on a similar feature?
Cheers, Javier
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)