bladerail opened a new issue, #40571:
URL: https://github.com/apache/airflow/issues/40571

   ### Apache Airflow Provider(s)
   
   docker
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-docker==3.10.0
   
   ### Apache Airflow version
   
   2.9.1-python3.11
   
   ### Operating System
   
   CentOS 7
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   I am running a Docker Swarm environment consisting of 3 manager hosts and 10 
worker hosts, all running CentOS 7, docker version 24.0.7. 
   
   ### What happened
   
   We run our DAGs as Python scripts embedded inside Docker Containers. When 
said python script uses the [tqdm](https://pypi.org/project/tqdm/) progress 
bar, the DAG reflects a failure because an exception is raised in 
DockerSwarmOperator's `stream_new_logs` method, which expects the log message 
to begin with a timestamp. The Docker Container actually continues running to 
completion, but the DAG has already been marked a failure because of the raised 
exception. 
   
   As a result, the container/service does not get cleaned up, the logs for a 
successful run do not display, and the DAG mistakenly shows as a failed run.
   
   An example error log has been attached.
   [error.txt](https://github.com/user-attachments/files/16076781/error.txt)
   
   
   ### What you think should happen instead
   
   Since the python script inside the Docker Container actually runs to 
completion and successfully, Airflow should display this as a successful run 
instead of failure. 
   
   ### How to reproduce
   
   1. Create a Docker Swarm.
   2. Start Airflow with default settings.
   3. Create a Docker image that runs the following python code.
   ```
   from tqdm import tqdm
   
   
   if __name__ == "__main__":
       for i in tqdm(range(10000), total=10000):
           if i % 5 == 0:
               print(i)
   ```
   4. Create a DAG to run the Docker Image created in Step 3.
   5. Run said DAG created in step 4.
   
   ### Anything else
   
   I am attaching several screenshots below. They show different runs of the 
same DAG, and show that it fails at different times. This means that, for 
processes where the tqdm progress bar is short enough, the DAG can actually 
pass and does not get affected by the `stream_new_logs` issue. I created a 
python script that did a tqdm from 0 to 10000, printing on every 5th iteration. 
The screenshots below show that Airflow is generally able to display the logs 
and fail at various points, as early as 775 and as late as 8170. This DAG has 
never succeeded before, but I am able to verify via the dead containers that 
they actually all ran to completion (`docker logs ${container_id}` shows that 
tqdm ran all the way to 10000).
   
   
![Demo1](https://github.com/apache/airflow/assets/6542402/5d561b12-f621-4ef0-aaf9-a299f25dc263)
   
![Demo2](https://github.com/apache/airflow/assets/6542402/283bb62f-34bf-42b9-89a7-913eaccccd6b)
   
![Demo3](https://github.com/apache/airflow/assets/6542402/e94c6754-e725-4f88-8a0c-f0f3c71d4fc8)
   
![Demo4](https://github.com/apache/airflow/assets/6542402/4da312c6-88c5-49cc-827e-848ed92b02fd)
   
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to