melle2 opened a new issue #11085: URL: https://github.com/apache/airflow/issues/11085
**Context** What I'm going to describe here is more or less related to #7141. But the context is bigger than described there. That's the reason why I open another Bug report. I'm struggling with this issue, too. I have tried a lot to be able to see the log entries in web GUI. Maybe I'm wrong, but in my view all this Elasticsearch process cannot be used at all. I followed all the discussion in #7141 but it seems like that some of the points are still not clear. I try to explain the issues in more high level instead of talking Python. My configuration: ``` [elasticsearch] host = http://elasticsearch_de:9200 log_id_template = {{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}} end_of_log_mark = end_of_log write_stdout = True json_format = True json_fields = execution_date, asctime, filename, lineno, levelname, message ``` My example, the pure (snipped) console log output after a DAG has been executed: ``` {"execution_date": "2020_09_21T14_00_00_000000", "asctime": null, "filename": "taskinstance.py", "lineno": 670, "levelname": "INFO", "message": "Dependencies all met for <TaskInstance: stg_cam_asm_load.asm_stage 2020-09-21T14:00:00+00:00 [queued]>", "dag_id": "stg_cam_asm_load", "task_id": "asm_stage", "try_number": "1"} ... {"execution_date": "2020_09_21T14_00_00_000000", "asctime": null, "filename": "taskinstance.py", "lineno": 1187, "levelname": "INFO", "message": "Marking task as FAILED. dag_id=stg_cam_asm_load, task_id=asm_stage, execution_date=20200921T140000, start_date=20200921T150124, end_date=20200921T150125", "dag_id": "stg_cam_asm_load", "task_id": "asm_stage", "try_number": "1"} {"execution_date": "2020_08_29T14_00_00_000000", "asctime": null, "filename": "local_task_job.py", "lineno": 102, "levelname": "INFO", "message": "Task exited with return code 1", "dag_id": "stg_cam_asm_load", "task_id": "mail_on_failure_production_support", "try_number": "1"} end_of_log[2020-09-21 17:01:30,274: INFO/ForkPoolWorker-15] Task airflow.executors.celery_executor.execute_command[1cd84830-e24b-459a-9fc9-46bb4102709b] succeeded in 8.639222555000742s: None ``` I send all this entries (manually) to Elasticsearch (as it is exactly the output I get): 1. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"execution_date": "2020_08_29T14_00_00_000000", "asctime": null, "filename": "taskinstance.py", "lineno": 670, "levelname": "INFO", "message": "Dependencies all met for <TaskInstance: stg_cam_asm_load.mail_on_failure_production_support 2020-08-29T14:00:00+00:00 [queued]>", "dag_id": "stg_cam_asm_load", "task_id": "mail_on_failure_production_support", "try_number": "1"}'``` 2. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"execution_date": "2020_09_21T14_00_00_000000", "asctime": null, "filename": "taskinstance.py", "lineno": 1187, "levelname": "INFO", "message": "Marking task as FAILED. dag_id=stg_cam_asm_load, task_id=asm_stage, execution_date=20200921T140000, start_date=20200921T150124, end_date=20200921T150125", "dag_id": "stg_cam_asm_load", "task_id": "asm_stage", "try_number": "1"}'``` 3. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"execution_date": "2020_08_29T14_00_00_000000", "asctime": null, "filename": "local_task_job.py", "lineno": 102, "levelname": "INFO", "message": "Task exited with return code 1", "dag_id": "stg_cam_asm_load", "task_id": "mail_on_failure_production_support", "try_number": "1"}'``` 4. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data 'end_of_log[2020-09-21 17:01:30,274: INFO/ForkPoolWorker-15] Task airflow.executors.celery_executor.execute_command[1cd84830-e24b-459a-9fc9-46bb4102709b] succeeded in 8.639222555000742s: None'``` **Problem Description** Based on this entries the web server cannot find ANY entry as there are several issues. 1. Very obvious, there is no linefeed after "end_of_log". Hence, the complete will be sent to Elasticsearch. 1.1 Even if I add a "\n" to the configuration I will see "end_of_log\n". 1.2 This output is plain text an not JSON. 2. The "log_id" which is defined in configuration "log_id_template" is missing in every log output. 3. The "offset" is missing. Based on the offset the order of the log entries is defined. Without offset nothing is displayed. 4. And, in addition to that, there is a gap between "execution_date" which is in the log output and the execution date format the web GUI queries in Eleasticsearch: ``` [2020-09-21 16:00:30,922] {base.py:81} INFO - GET http://elasticsearch_de:9200/_count [status:200 request:0.003s] 192.168.105.1 - - [21/Sep/2020:16:00:30 +0000] "GET /admin/airflow/get_logs_with_metadata?dag_id=stg_cam_asm_load&task_id=asm_stage&execution_date=2020-09-21T14%3A00%3A00%2B00%3A00&try_number=1&metadata=%7B%22end_of_log%22%3Afalse%2C%22last_log_timestamp%22%3A%222020-09-21T15%3A55%3A31.254697%2B00%3A00%22%2C%22offset%22%3A%220%22%7D HTTP/1.1" 200 116 "http://192.168.105.100:8080/admin/airflow/log?task_id=asm_stage&dag_id=stg_cam_asm_load&execution_date=2020-09-21T14%3A00%3A00%2B00%3A00&format=json" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0" ``` As you can see 2020_09_21T14_00_00_000000 != 2020-09-21T14:00:00+00:00 (2020-09-21T14%3A00%3A00%2B00%3A00) **Expected Behavior** Only if I manipulate everything described in step 1-4 and add [log_id], [offset], adjust [end_of_log] output and adjust [execution_date]: 1. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"log_id": "stg_cam_asm_load-asm_stage-2020-09-21T14:00:00+00:00-1", "execution_date": "2020_09_21T14_00_00_000000", "asctime": null, "filename": "taskinstance.py", "lineno": 670, "levelname": "INFO", "message": "Dependencies all met for <TaskInstance: stg_cam_asm_load.asm_stage 2020-09-21T14:00:00+00:00 [queued]>", "dag_id": "stg_cam_asm_load", "task_id": "asm_stage", "try_number": "1", "offset": "1"}'``` 2. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"log_id": "stg_cam_asm_load-asm_stage-2020-09-21T14:00:00+00:00-1", "execution_date": "2020_09_21T14_00_00_000000", "asctime": null, "filename": "taskinstance.py", "lineno": 1187, "levelname": "INFO", "message": "Marking task as FAILED. dag_id=stg_cam_asm_load, task_id=asm_stage, execution_date=20200921T140000, start_date=20200921T150124, end_date=20200921T150125", "dag_id": "stg_cam_asm_load", "task_id": "asm_stage", "try_number": "1", "offset": "2"}'``` 3. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"log_id": "stg_cam_asm_load-asm_stage-2020-09-21T14:00:00+00:00-1", "execution_date": "2020_09_21T14_00_00_000000", "asctime": null, "filename": "local_task_job.py", "lineno": 102, "levelname": "INFO", "message": "Task exited with return code 1", "dag_id": "stg_cam_asm_load", "task_id": "asm_stage", "try_number": "1", "offset": "3"}'``` 4. ```curl -X POST -H 'Content-Type: application/json' -i 'http://elasticsearch_de:9200/airflow/_doc' --data '{"log_id": "stg_cam_asm_load-asm_stage-2020-09-21T14:00:00+00:00-1", "message": "end_of_log", "offset": "4"}'``` ...all log entries are displayed in the web GUI:  What would be a helpful bugfix: 1. Ensure that every log output is enriched by log_id and offset. 2. [end of line] is logged within a valid JSON (or probably think about a way how to do this in a different way.) 3. [execution_date] log output and the way the web GUI queries in Elasticsearch does match. **Note** Maybe my understanding of the whole process is wrong. Any solution without bugfix is highly welcome. **Apache Airflow version**: 1.10.12 **Environment**: - **Cloud provider or hardware configuration**: Docker Image in Oracle VirtualBox - **OS** (e.g. from /etc/os-release): RHEL8 - **Others**: Python 3.8 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
