Javier Domingo Cansino created AIRFLOW-2480:
-----------------------------------------------

             Summary: DAGs per dataset instead of per date
                 Key: AIRFLOW-2480
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2480
             Project: Apache Airflow
          Issue Type: New Feature
            Reporter: Javier Domingo Cansino


Currently airflow runs on a date basis. All the scheduling and running logic 
runs on thinking that ETLs depend on the date they are run. However, there are 
another set of usecases where it's not the date what varies, but the dataset 
itself.

One example application is when treating genomic data. This data doesn't 
change, but the usecase is to run all DAGs you may have on samples, rather than 
dates. This can also be applied to when one has services that rely on 
diagnosing datasets.

For now, one way to solve this is by creating a DAG per user, scheduling it 
with None, and triggering it manually from the UI/cli, however it has the 
drawback that there is only one column in the dates, as new datasets will just 
create new DAGs.

Of course, backfill processes would be applied to run an specific DAG on all 
the samples, rather than just an specific one.

There are a few questions I would like to ask:

 * How accoplated is the current design of the scheduler/executors in airflow 
to dates?

 * Is this a contribution someone would be interested in (besides me)?

 * Is there any work in progress on a similar feature?

 

Cheers, Javier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to