larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add
log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366655381
##########
File path: airflow/utils/log/es_task_handler.py
##########
@@ -255,7 +256,9 @@ def close(self):
# Mark the end of file using end of log mark,
# so we know where to stop while auto-tailing.
- self.handler.stream.write(self.end_of_log_mark)
+ if self.write_stdout:
+ print()
+ self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
Review comment:
from 5528:
When the end_of_log_mark is wrapped in a log record, the
end_of_log_mark can no longer be
able to match the log line in _read:
metadata['end_of_log'] = False if not logs \
else logs[-1].message == self.end_of_log_mark.strip()
It leads to the UI keeps calling backend and generates lots of load to
ES.
By removing the log_id from the end-of-log mark, it would make it worse as
the ui would continue to try to find the end-of-log mark and it won't ever find
it as it searches the end-of-log mark by log_id.
I am not sure what the sentence mean by "When the end_of_log_mark is
wrapped in a log record". I also observed that the end-of-log mark might end up
within the same line of other log lines and it would prevent us from finding
the end-of-log mark in those cases. To address that, I always add an obnoxious
print right in front of the end-of-log mark line, to ensure the "end-of-log"
mark is always in a separate line when printing to console. This is import for
filebeat/logstash on kubernetes to pick up the end-of-log mark log line in a
separate document.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services