>From http://pythonhosted.org/airflow/faq.html:
*What’s the deal with ``start_date``?* start_date is partly legacy from the pre-DagRun era, but it is still relevant in many ways. When creating a new DAG, you probably want to set a global start_date for your tasks usingdefault_args. The first DagRun to be created will be based on the min(start_date) for all your task. From that point on, the scheduler creates new DagRuns based on your schedule_interval and the corresponding task instances run as your dependencies are met. When introducing new tasks to your DAG, you need to pay special attention to start_date, and may want to reactivate inactive DagRuns to get the new task to get onboarded properly. We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along. Previously we also recommended using rounded start_date in relation to your schedule_interval. This meant an @hourly would be at 00:00 minutes:seconds, a @daily job at midnight, a @monthlyjob on the first of the month. This is no longer required. Airflow will not auto align the start_dateand the schedule_interval, by using the start_date as the moment to start looking. You can use any sensor or a TimeDeltaSensor to delay the execution of tasks within the schedule interval. While schedule_interval does allow specifying a datetime.timedelta object, we recommend using the macros or cron expressions instead, as it enforces this idea of rounded schedules. When using depends_on_past=True it’s important to pay special attention to start_date as the past dependency is not enforced only on the specific schedule of the start_date specified for the task. It’ also important to watch DagRun activity status in time when introducing new depends_on_past=True, unless you are planning on running a backfill for the new task(s). Also important to note is that the tasks start_date, in the context of a backfill CLI command, get overridden by the backfill’s command start_date. This allows for a backfill on tasks that havedepends_on_past=True to actually start, if it wasn’t the case, the backfill just wouldn’t start. On Tue, Aug 9, 2016 at 7:44 AM, הילה ויזן <[email protected]> wrote: > Hi, > > We're experiencing a strange problem with the start_date configuration in > Airflow. > > When we first ran the DAGs, we defined the start_date as 'datetime.now()', > which at the time was 01/08/2016. This worked fine. A week afterwards, we > changed the DAGs to a specific newer date - 08/08/2016, and reset all of > the tasks. After resetting the Airflow and all of the DAGs *we are still > seeing the tasks running from original date (01/08)*. Why is this > happening? > > We don't understand why the tasks are still using the old date. Is there a > cache/DB/persistent file that the DAG reads on startup that overrides our > definition? Is it maybe Celery? We really would appreciate your input > because we are totally stuck. > > We use airflow version 1.7.1.3 with postgress as the backend DB. > In addition, we run in CeleryExecutor mode with rabbitMQ as Celery backend. > > Thank you, > Hila >
