topherinternational commented on code in PR #53821:
URL: https://github.com/apache/airflow/pull/53821#discussion_r2305403161
##########
providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -368,7 +396,19 @@ def _read(
# If we hit the end of the log, remove the actual end_of_log message
# to prevent it from showing in the UI.
def concat_logs(hits: list[Hit]) -> str:
- log_range = (len(hits) - 1) if hits[-1].message ==
self.end_of_log_mark else len(hits)
+ # In Airflow 2.x, the log record JSON has a "message" key, e.g.:
+ # {
+ # "message": "Dag name:dataset_consumes_1 queued_at:2025-08-12
15:05:57.703493+00:00",
+ # "offset": 1755011166339518208,
+ # "log_id":
"dataset_consumes_1-consuming_1-manual__2025-08-12T15:05:57.691303+00:00--1-1"
+ # }
+ #
+ # In Airflow 3.x, the "message" field is renamed to "event".
+ # We check the correct attribute depending on the Airflow major
version.
+ if AIRFLOW_V_3_0_PLUS:
+ log_range = (len(hits) - 1) if hits[-1].event ==
self.end_of_log_mark else len(hits)
Review Comment:
Idk if I am reading this right but does this break backcompat for reading
Airflow 2 logs? Seems like the real factor for which key to use isn't which
version the stack is on but which version was used to write the ES record?
(Ditto above for lines 359-362.)
If this is the case could this be fixed by inferring the hit format with a
contains or hasattr check?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]