msimmoteit-neozo commented on PR #52164:
URL: https://github.com/apache/airflow/pull/52164#issuecomment-3016836668
First of all, I want to thank you for taking your time and looking into my
PR. I appreciate that.
I think, I made the reason for this PR clear in my comment. But I want to
add a bit of context. We're using the SparkKubernetesOperator and I wanted to
use the `progress_callback`. Usually the startup of the Pod looks like this in
the Airflow Logs:
```
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ id -u
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + myuid=185
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ id -g
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + mygid=0
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + set +e
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ getent passwd 185
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + uidentry=
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + set -e
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z '' ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -w /etc/passwd ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + echo '185:x:185:0:anonymous
uid:/opt/spark:/bin/false'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z /opt/java/openjdk ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + SPARK_CLASSPATH=':/opt/spark/jars/*'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + env
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + grep SPARK_JAVA_OPT_
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + sort -t_ -k4 -n
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + sed 's/[^=]*=\(.*\)/\1/g'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ command -v readarray
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' readarray ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + readarray -t SPARK_EXECUTOR_JAVA_OPTS
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -n '' ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -n '' ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z ']'
[2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z x ']'
```
I added a `callbacks` argument with a `progress_callback` equivalent to this:
```
@staticmethod
def progress_callback(*, line: str, client, mode: str, **kwargs):
logger.log.warning(f"The log line: {line}")
```
using the logging functionality of Airflow. Normally, I would expect all log
lines to be duplicated, yet this is what happened:
```
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ id -u
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + myuid=185
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ id -g
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + mygid=0
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + set +e
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ getent passwd 185
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + uidentry=
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + set -e
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z '' ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -w /etc/passwd ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + echo '185:x:185:0:anonymous
uid:/opt/spark:/bin/false'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z /opt/java/openjdk ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + SPARK_CLASSPATH=':/opt/spark/jars/*'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + env
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + grep SPARK_JAVA_OPT_
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + sort -t_ -k4 -n
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + sed 's/[^=]*=\(.*\)/\1/g'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] ++ command -v readarray
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' readarray ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + readarray -t SPARK_EXECUTOR_JAVA_OPTS
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -n '' ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -n '' ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z ']'
[2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line:
2025-06-24T10:10:05.975999206Z ++ id -u
[2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO -
[spark-kubernetes-driver] + '[' -z x ']'
```
My callback was repeatedly called with the first line ever logged. I am
aware that our deployment is a few versions behind, but this is why I created
the unit test.
Best regards
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]