The GitHub Actions job "Tests (AMD)" on 
airflow.git/fix/kpo-empty-log-line-pollution-36571 has succeeded.
Run started by GitHub user kaxil (triggered by kaxil).

Head commit for run:
ed18cfb3f6dab90674d83e0a10fe43ec2c66cf5c / Kaxil Naik <[email protected]>
Fix KubernetesPodOperator emitting orphan timestamps for empty container writes

When a container running under KPO writes an empty line, kubelet streams
it back (with ``timestamps=True``) as ``"<rfc3339-ts> \n"`` -- a timestamp
followed by a separator space and an empty message. ``parse_log_line``
called ``line.strip().partition(" ")`` which removed the trailing
separator space before partitioning, so the function returned
``timestamp=None`` and the caller treated the line as a continuation of
the previous buffered log record. The bare RFC3339 string was then
appended onto the previous message and emitted as a multi-line log
where only the first line carried the Airflow ``[ts] {pod_manager.py:N}
INFO -`` prefix, leaving unprefixed timestamp rows interleaved in task
logs.

Downstream that breaks
``airflow.utils.log.file_task_handler._parse_timestamp`` (which feeds
the line to ``pendulum.parse`` after stripping ``[]``): malformed
fragments from these orphan rows can raise
``ValueError: month must be in 1..12`` and fail the task entirely.

The fix:

* ``parse_log_line`` no longer pre-strips the line; it ``rstrip("\n")``
  only and partitions on the original separator, so empty container
  writes are recognised as ``(timestamp, "")`` rather than as
  continuations. It also catches ``ValueError`` (not just
  ``ParserError``) so a malformed timestamp can never escape.
* The sync and async log consumer loops skip emit for empty messages
  -- the resume marker still advances in the sync path, but no noisy
  ``[base] `` row is written.

Regressed in #33675 (cncf-kubernetes 7.5.0, Aug 2023) which replaced
the original ``line.find(" ")`` split with the strip+partition pattern
under the banner of a refactor. The pre-refactor implementation
correctly handled ``<ts> \n`` because ``find(" ")`` matched the
separator space directly. Reported in #36571 against 7.12.0 / 7.13.0,
still reproducible on the current main.

Report URL: https://github.com/apache/airflow/actions/runs/26584673108

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to