Repository: incubator-airflow
Updated Branches:
  refs/heads/master 2078daca3 -> ebe715c56


[AIRFLOW-1691] Add better Google cloud logging documentation

Closes #2671 from criccomini/fix-log-docs


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ebe715c5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ebe715c5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ebe715c5

Branch: refs/heads/master
Commit: ebe715c565ad9206c9db6a496a1f97326d5baf8a
Parents: 2078dac
Author: Chris Riccomini <[email protected]>
Authored: Mon Oct 9 10:32:34 2017 -0700
Committer: Chris Riccomini <[email protected]>
Committed: Mon Oct 9 10:32:34 2017 -0700

----------------------------------------------------------------------
 UPDATING.md          |  6 ++--
 docs/integration.rst | 71 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ebe715c5/UPDATING.md
----------------------------------------------------------------------
diff --git a/UPDATING.md b/UPDATING.md
index 329f416..6a0b8bc 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -129,13 +129,13 @@ The `file_task_handler` logger is more flexible. You can 
change the default form
 
 #### I'm using S3Log or GCSLogs, what do I do!?
 
-IF you are logging to either S3Log or GCSLogs, you will need a custom logging 
config. The `REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config 
has been removed, therefore you will need to take the following steps:
+If you are logging to Google cloud storage, please see the [Google cloud 
platform 
documentation](https://airflow.incubator.apache.org/integration.html#gcp-google-cloud-platform)
 for logging instructions.
+
+If you are using S3, the instructions should be largely the same as the Google 
cloud platform instructions above. You will need a custom logging config. The 
`REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been 
removed, therefore you will need to take the following steps:
  - Copy the logging configuration from 
[`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/airflow_local_settings.py)
 and copy it. 
  - Place it in a directory inside the Python import path `PYTHONPATH`. If you 
are using Python 2.7, ensuring that any `__init__.py` files exist so that it is 
importable.
  - Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` 
explicitly in the config. The `REMOTE_BASE_LOG_FOLDER` key is not used anymore. 
  - Set the `logging_config_class` to the filename and dict. For example, if 
you place `custom_logging_config.py` on the base of your pythonpath, you will 
need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG` in 
your config as Airflow 1.8.
- 
-ELSE you don't need to change anything. If there is no custom config, the 
airflow config loader will still default to the same config. 
 
 ### New Features
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ebe715c5/docs/integration.rst
----------------------------------------------------------------------
diff --git a/docs/integration.rst b/docs/integration.rst
index 3b50586..cd6cc68 100644
--- a/docs/integration.rst
+++ b/docs/integration.rst
@@ -184,6 +184,77 @@ Airflow has extensive support for the Google Cloud 
Platform. But note that most
 Operators are in the contrib section. Meaning that they have a *beta* status, 
meaning that
 they can have breaking changes between minor releases.
 
+Logging
+''''''''
+
+Airflow can be configured to read and write task logs in Google cloud storage.
+Follow the steps below to enable Google cloud storage logging.
+
+#. Airlfow's logging system requires a custom .py file to be located in the 
``PYTHONPATH``, so that it's importable from Airflow. Start by creating a 
directory to store the config file. ``$AIRFLOW_HOME/config`` is recommended.
+#. Set ``PYTHONPATH=$PYTHONPATH:<AIRFLOW_HOME>/config`` in the Airflow 
environment. If using Supervisor, you can set this in the ``supervisord.conf`` 
environment parameter. If not, you can export ``PYTHONPATH`` using your 
preferred method.
+#. Create empty files called ``$AIRFLOW_HOME/config/log_config.py`` and 
``$AIRFLOW_HOME/config/__init__.py``.
+#. Copy the contents of ``airflow/config_templates/airflow_local_settings.py`` 
into the ``log_config.py`` file that was just created in the step above.
+#. Customize the following portions of the template:
+
+    .. code-block:: bash
+
+        # Add this variable to the top of the file. Note the trailing slash.
+        GCS_LOG_FOLDER = 'gs://<bucket where logs should be persisted>/'
+
+        # Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG
+        LOGGING_CONFIG = ...
+
+        # Add a GCSTaskHandler to the 'handlers' block of the LOGGING_CONFIG 
variable
+        'gcs.task': {
+            'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
+            'formatter': 'airflow.task',
+            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
+            'gcs_log_folder': GCS_LOG_FOLDER,
+            'filename_template': FILENAME_TEMPLATE,
+        },
+
+        # Update the airflow.task and airflow.tas_runner blocks to be 
'gcs.task' instead of 'file.task'.
+        'loggers': {
+            'airflow.task': {
+                'handlers': ['gcs.task'],
+                ...
+            },
+            'airflow.task_runner': {
+                'handlers': ['gcs.task'],
+                ...
+            },
+            'airflow': {
+                'handlers': ['console'],
+                ...
+            },
+        }
+
+#. Make sure a Google cloud platform connection hook has been defined in 
Airflow. The hook should have read and write access to the Google cloud storage 
bucket defined above in ``GCS_LOG_FOLDER``.
+
+#. Update ``$AIRFLOW_HOME/airflow.cfg`` to contain:
+
+    .. code-block:: bash
+
+        task_log_reader = gcs.task
+        logging_config_class = log_config.LOGGING_CONFIG
+        remote_log_conn_id = <name of the Google cloud platform hook>
+
+#. Restart the Airflow webserver and scheduler, and trigger (or wait for) a 
new task execution.
+#. Verify that logs are showing up for newly executed tasks in the bucket 
you've defined.
+#. Verify that the Google cloud storage viewer is working in the UI. Pull up a 
newly executed task, and verify that you see something like:
+
+    .. code-block:: bash
+
+        *** Reading remote log from gs://<bucket where logs should be 
persisted>/example_bash_operator/run_this_last/2017-10-03T00:00:00/16.log.
+        [2017-10-03 21:57:50,056] {cli.py:377} INFO - Running on host 
chrisr-00532
+        [2017-10-03 21:57:50,093] {base_task_runner.py:115} INFO - Running: 
['bash', '-c', u'airflow run example_bash_operator run_this_last 
2017-10-03T00:00:00 --job_id 47 --raw -sd 
DAGS_FOLDER/example_dags/example_bash_operator.py']
+        [2017-10-03 21:57:51,264] {base_task_runner.py:98} INFO - Subtask: 
[2017-10-03 21:57:51,263] {__init__.py:45} INFO - Using executor 
SequentialExecutor
+        [2017-10-03 21:57:51,306] {base_task_runner.py:98} INFO - Subtask: 
[2017-10-03 21:57:51,306] {models.py:186} INFO - Filling up the DagBag from 
/airflow/dags/example_dags/example_bash_operator.py
+
+Note the top line that says it's reading from the remote log file.
+
+Please be aware that if you were persisting logs to Google cloud storage using 
the old-style airflow.cfg configuration method, the old logs will no longer be 
visible in the Airflow UI, though they'll still exist in Google cloud storage. 
This is a backwards incompatbile change. If you are unhappy with it, you can 
change the ``FILENAME_TEMPLATE`` to reflect the old-style log filename format.
+
 BigQuery
 ''''''''
 

Reply via email to