ashb commented on a change in pull request #5048: [AIRFLOW-3370] Add stdout
output options to Elasticsearch task log handler
URL: https://github.com/apache/airflow/pull/5048#discussion_r288925385
##########
File path: docs/howto/write-logs.rst
##########
@@ -137,3 +137,50 @@ example:
[2017-10-03 21:57:51,306] {base_task_runner.py:98} INFO - Subtask:
[2017-10-03 21:57:51,306] {models.py:186} INFO - Filling up the DagBag from
/airflow/dags/example_dags/example_bash_operator.py
**Note** that the path to the remote log file is listed on the first line.
+
+.. _write-logs-elasticsearch:
+
+Writing Logs to Elasticsearch
+------------------------------------
+
+Airflow can be configured to read task logs from Elasticsearch and optionally
write logs to stdout in standard or json format. These logs can later be
collected and forwarded to the Elasticsearch cluster using tools like fluentd,
logstash or others.
+
+You can choose to have all task logs from workers output to the highest parent
level process, instead of the standard file locations. This allows for some
additional flexibility in container environments like Kubernetes, where
container stdout is already being logged to the host nodes. From there a log
shipping tool can be used to forward them along to Elasticsearch. To use this
feature, set the ``elasticsearch_write_stdout`` option in ``airflow.cfg``.
+You can also choose to have the logs output in a JSON format, using the
``elasticsearch_json_format`` option. Airflow uses the standard Python logging
module and JSON fields are directly extracted from the LogRecord object. To use
this feature, set the ``elasticsearch_json_fields`` option in ``airflow.cfg``.
Add the fields to the comma-delimited string that you want collected for the
logs. These fields are from the LogRecord object in the ``logging`` module.
`Documentation on different attributes can be found here
<https://docs.python.org/3/library/logging.html#logrecord-objects/>`_.
+
+First, to use the handler, airflow.cfg must be configured as follows:
+
+.. code-block:: bash
+
+ [core]
+ # Airflow can store logs remotely in AWS S3, Google Cloud Storage or
Elastic Search.
+ # Users must supply an Airflow connection id that provides access to the
storage
+ # location. If remote_logging is set to true, see UPDATING.md for
additional
+ # configuration requirements.
+ remote_logging = True
+
+ [elasticsearch]
+ elasticsearch_host = {{ host }}:{{ port }}
Review comment:
```suggestion
elasticsearch_host = my-elasticsearch:9200
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services