[AIRFLOW-1803] Time zone documentation
Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/f1ab56cc Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/f1ab56cc Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/f1ab56cc Branch: refs/heads/master Commit: f1ab56cc6ad3b9419af94aaa333661c105185883 Parents: 518a41a Author: Bolke de Bruin <[email protected]> Authored: Sat Nov 18 14:04:15 2017 +0100 Committer: Bolke de Bruin <[email protected]> Committed: Mon Nov 27 15:54:27 2017 +0100 ---------------------------------------------------------------------- docs/index.rst | 1 + docs/timezone.rst | 143 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 144 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/f1ab56cc/docs/index.rst ---------------------------------------------------------------------- diff --git a/docs/index.rst b/docs/index.rst index 2a1f1c1..42349ea 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -83,6 +83,7 @@ Content scheduler plugins security + timezone api integration faq http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/f1ab56cc/docs/timezone.rst ---------------------------------------------------------------------- diff --git a/docs/timezone.rst b/docs/timezone.rst new file mode 100644 index 0000000..ca30686 --- /dev/null +++ b/docs/timezone.rst @@ -0,0 +1,143 @@ +Time zones +========== + +Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database. + It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not them to the end + userâs time zone in the user interface. Also templates used in Operators are not translated. Time zone information + is exposed and it is left up to the writer of DAG what do with it. + +This is handy if your users live in more than one time zone and you want to display datetime information according to +each userâs wall clock. + +Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database +(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is +Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward +in autumn. If youâre working in local time, youâre likely to encounter errors twice a year, when the transitions +happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesnât matter +for a simple DAG, but itâs a problem if you are in, for example, financial services where you have end of day +deadlines to meet. + +The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the systemâs settings or +an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. +Pendulum is installed when you install Airflow. + +Please note that the Web UI currently only runs in UTC. + +Concepts +-------- +Naïve and aware datetime objects +'''''''''''''''''''''''''''''''' + +Pythonâs datetime.datetime objects have a tzinfo attribute that can be used to store time zone information, +represented as an instance of a subclass of datetime.tzinfo. When this attribute is set and describes an offset, +a datetime object is aware. Otherwise, itâs naive. + +You can use timezone.is_aware() and timezone.is_naive() to determine whether datetimes are aware or naive. + +Because Airflow uses time-zone-aware datetime objects. If your code creates datetime objects they need to be aware too. + +.. code:: python + + from airflow.utils import timezone + + now = timezone.utcnow() + a_date = timezone.datetime(2017,1,1) + + +Interpretation of naive datetime objects +'''''''''''''''''''''''''''''''''''''''' + +Although Airflow operates fully time zone aware, it still accepts naive date time objects for `start_dates` +and `end_dates` in your DAG definitions. This is mostly in order to preserve backwards compatibility. In +case a naive `start_date` or `end_date` is encountered the default time zone is applied. It is applied +in such a way that it is assumed that the naive date time is already in the default time zone. In other +words if you have a default time zone setting of `Europe/Amsterdam` and create a naive datetime `start_date` of +`datetime(2017,1,1)` it is assumed to be a `start_date` of Jan 1, 2017 Amsterdam time. + +.. code:: python + + default_args=dict( + start_date=datetime(2016, 1, 1), + owner='Airflow' + ) + + dag = DAG('my_dag', default_args=default_args) + op = DummyOperator(task_id='dummy', dag=dag) + print(op.owner) # Airflow + +Unfortunately, during DST transitions, some datetimes donât exist or are ambiguous. +In such situations, pendulum raises an exception. Thatâs why you should always create aware +datetime objects when time zone support is enabled. + +In practice, this is rarely an issue. Airflow gives you aware datetime objects in the models and DAGs, and most often, +new datetime objects are created from existing ones through timedelta arithmetic. The only datetime thatâs often +created in application code is the current time, and timezone.utcnow() automatically does the right thing. + + +Default time zone +''''''''''''''''' + +The default time zone is the time zone defined by the `default_timezone` setting under `[core]`. If +you just installed Airflow it will be set to `utc`, which is recommended. You can also set it to +`system` or an IANA time zone (e.g.`Europe/Amsterdam`). DAGs are also evaluated on Airflow workers, +it is therefore important to make sure this setting is equal on all Airflow nodes. + + +.. code:: python + + [core] + default_timezone = utc + + +Time zone aware DAGs +-------------------- + +Creating a time zone aware DAG is quite simple. Just make sure to supply a time zone aware `start_date`. It is +recommended to use `pendulum` for this, but `pytz` (to be installed manually) can also be used for this. + +.. code:: python + + import pendulum + + local_tz = pendulum.timezone("Europe/Amsterdam") + + default_args=dict( + start_date=datetime(2016, 1, 1, tzinfo=local_tz), + owner='Airflow' + ) + + dag = DAG('my_tz_dag', default_args=default_args) + op = DummyOperator(task_id='dummy', dag=dag) + print(dag.timezone) # <Timezone [Europe/Amsterdam]> + + + +Templates +''''''''' + +Airflow returns time zone aware datetimes in templates, but does not convert them to local time so they remain in UTC. +It is left up to the DAG to handle this. + +.. code:: python + + import pendulum + + local_tz = pendulum.timezone("Europe/Amsterdam") + local_tz.convert(execution_date) + + +Cron schedules +'''''''''''''' + +In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will +then ignore day light savings time. Thus, if you have a schedule that says +run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, +regardless if day light savings time is in place. + + +Time deltas +''''''''''' +For schedules with time deltas Airflow assumes you always will want to run with the specified interval. So if you +specify a timedelta(hours=2) you will always want to run to hours later. In this case day light savings time will +be taken into account. +
