msimmoteit-neozo commented on PR #52164:
URL: https://github.com/apache/airflow/pull/52164#issuecomment-3016836668

   First of all, I want to thank you for taking your time and looking into my 
PR. I appreciate that.
   
   I think, I made the reason for this PR clear in my comment. But I want to 
add a bit of context. We're using the SparkKubernetesOperator and I wanted to 
use the `progress_callback`. Usually the startup of the Pod looks like this in 
the Airflow Logs:
   ```
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ id -u
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + myuid=185
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ id -g
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + mygid=0
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + set +e
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ getent passwd 185
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + uidentry=
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + set -e
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z '' ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -w /etc/passwd ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + echo '185:x:185:0:anonymous 
uid:/opt/spark:/bin/false'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z /opt/java/openjdk ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + SPARK_CLASSPATH=':/opt/spark/jars/*'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + env
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + grep SPARK_JAVA_OPT_
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + sort -t_ -k4 -n
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + sed 's/[^=]*=\(.*\)/\1/g'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ command -v readarray
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' readarray ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + readarray -t SPARK_EXECUTOR_JAVA_OPTS
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -n '' ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -n '' ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z ']'
   [2025-06-24, 09:54:44 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z x ']'
   ```
   I added a `callbacks` argument with a `progress_callback` equivalent to this:
   ```
   @staticmethod
   def progress_callback(*, line: str, client, mode: str, **kwargs):
       logger.log.warning(f"The log line: {line}")
   ```
   using the logging functionality of Airflow. Normally, I would expect all log 
lines to be duplicated, yet this is what happened:
   ```
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ id -u
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + myuid=185
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ id -g
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + mygid=0
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + set +e
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ getent passwd 185
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + uidentry=
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + set -e
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z '' ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -w /etc/passwd ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + echo '185:x:185:0:anonymous 
uid:/opt/spark:/bin/false'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z /opt/java/openjdk ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + SPARK_CLASSPATH=':/opt/spark/jars/*'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + env
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + grep SPARK_JAVA_OPT_
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + sort -t_ -k4 -n
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + sed 's/[^=]*=\(.*\)/\1/g'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] ++ command -v readarray
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' readarray ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + readarray -t SPARK_EXECUTOR_JAVA_OPTS
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -n '' ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -n '' ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z ']'
   [2025-06-24, 10:10:16 UTC] {operator.py:36} WARNING - The log line: 
2025-06-24T10:10:05.975999206Z ++ id -u
   [2025-06-24, 10:10:16 UTC] {pod_manager.py:454} INFO - 
[spark-kubernetes-driver] + '[' -z x ']'
   ```
   
   My callback was repeatedly called with the first line ever logged. I am 
aware that our deployment is a few versions behind, but this is why I created 
the unit test.
   
   Best regards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to