This has been one of the bigger gotchas for people on my team. Once people
"get it" they get it but every single person in my company has assumed that
the behavior was the opposite of what it is.

While I understand that there are arguments in favor of the scheduling
being the way that it is, it feels very counter-intuitive to me, and I
think the argument for least surprise goes in the direction of the left
bound.

On Wed, Jul 20, 2016 at 3:59 PM Tyrone Hinderson <[email protected]>
wrote:

> I hear all that, but I doubt most teams use all of the stock settings in
> the core section of airflow.cfg. My point is that if this setting were
> provided, obviously defaulting to the current scenario, many teams might
> not even notice that it's there. I agree that allowing different scenarios
> per-dag probably isn't necessary, and I think that scheduler/webserver-wide
> is a good level at which to place this sort of configuration.
>
> Barring a new setting, the biggest problem is that if one thinks of a
> scheduled job as something that is relevant with respect to an instant
> rather than with respect to a time range, the current system serves only to
> confuse; after all, only one timestamp is displayed, luring someone like me
> to think of it as the instant at which a task instance is relevant. At the
> very least, the UI should be more clear about airflow's current opinion on
> the matter by displaying a range (i.e. both bounds).
>
> That said, I'm not convinced that the new setting would do more harm than
> good.
>
> On Wed, Jul 20, 2016 at 3:23 PM Maxime Beauchemin <
> [email protected]> wrote:
>
> > There's been a fair amount of discussion about this already. The argument
> > on whether the left or the right bound of the scheduling period is more
> > intuitive is valid and there's solid arguments on both sides, depending
> on
> > your use cases.  My personal point of view on this is that setting a
> > standard and sticking to it is less confusing than supporting different
> > options and managing the change from switching from one to the other.
> >
> > Related thoughts:
> > * supporting both within an environment (say defined on a per-dag basis)
> is
> > a bad idea (way too confusing!)
> > * supporting both as a cluster setting makes for a change management
> > headache when switching your environment from one to the other, note that
> > people internally may not even agree on which one to pick
> > * having the UI be more upfront about left and right bound of scheduling
> > periods is a reasonable solution (using clear ranges wherever
> > execution_date is used currently)
> >
> > Max
> >
> > On Tue, Jul 19, 2016 at 10:57 AM, Joy Gao <[email protected]> wrote:
> >
> > > +1 on this.
> > >
> > > We are using Airflow as a cron replacement, and we have biweekly jobs
> and
> > > monthly jobs as well.
> > > It would be really useful to be able to configure it such that dags run
> > on
> > > the start_date and the timestamps corresponds to it.
> > >
> > > On Tue, Jul 19, 2016 at 10:49 AM, Tyrone Hinderson <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > I'm aware that a DAG scheduled to start at time X with interval Y
> will
> > > > first run at time X + Y. The documentation describes this:
> > > >
> > > > "Note that if you run a DAG on a schedule_interval of one day, the
> run
> > > > stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In
> > other
> > > > words, the job instance is started once the period it covers has
> > ended."
> > > >
> > > > I'd like to know if this behavior is configurable? There may be a
> > > > particular way of thinking about business processes that fits this
> > > pattern;
> > > > however, seeing last week's date on a weekly job that ran today
> > confuses
> > > my
> > > > team, and I'd love to use a flag that makes
> > > > 1. DAGs run on the start_date
> > > > 2. DagRun timestamps correspond with the intended actual run
> date/time.
> > > >
> > >
> >
>

Reply via email to