sun2everyone opened a new issue, #28326:
URL: https://github.com/apache/airflow/issues/28326

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   **Airflow version:** Airflow 2.4.2 with Elasticsearch log provider (ELK 
7.17.0)
   **Python version:** 3.8.13
   
   **Short description:**
   Airflow webserver stops responding due to infinity log loading loop. 
   
   **Details:**
   If several users open grid view page and try to view logs of running task, 
they see no logs, but gunicorn workers became busy processing log loading 
requests. After some time all gunicorn workers became busy waiting for the end 
of log, and webserver stops responding to further client requests.
   
   **What you can see in  webserver logs:**
   constantly appearing requests to elastic
   ```
   {base.py:270} INFO - POST http://127.0.0.1:9202/_count [status:200 
request:0.006s]
   {base.py:270} INFO - POST http://127.0.0.1:9202/_search [status:200 
request:0.015s]
   {base.py:270} INFO - POST http://127.0.0.1:9202/_search [status:200 
request:0.015s]
   {base.py:270} INFO - POST http://127.0.0.1:9202/_count [status:200 
request:0.010s]
   {base.py:270} INFO - POST http://127.0.0.1:9202/_search [status:200 
request:0.014s]
   {base.py:270} INFO - POST http://127.0.0.1:9202/_count [status:200 
request:0.011s]
   ```
   
   ### What you think should happen instead
   
   When trying to view logs of running task in grid view, you should receive 
one of the following:
   1. Message that running task logs couldn't be shown
   2. See partially loaded logs, that exist at the moment you load the page
   3. See continuously appearing logs, as it is done on the main task log page
   
   ### How to reproduce
   
   1. Start airflow webserver with ELK logging and sync gunicorn workers
   2. Create dag with a task that is running for a long time and constantly 
writing it's logs
   3. Run that dag
   4. Open that dag in a grid view, try to view logs of that running task in a 
grid view
   5. Open more pages viewing logs of that running task in a grid view (5-6 or 
more)
   6. Webserver starts to work slowly, skips healthchecks and does not respond 
client requests
   
   ### Operating System
   
   Ubuntu 20.04.2 LTS
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-elasticsearch    4.2.1
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   I guess the problem is in this parts:
   
   - 
https://github.com/apache/airflow/blob/febf35500d5de172e25280e5f5492257f898fdf5/airflow/api_connexion/endpoints/log_endpoint.py#L111
   - 
https://github.com/apache/airflow/blob/fa2bec042995004f45b914dd1d66b466ccced410/airflow/utils/log/log_reader.py#L80
 
   
   Log reading stream keeps gunicorn worker busy all the time the task is in 
the Running state. 
   
   Using async workers like tornado makes a little better, but doesn't solve 
the problem. For now, I use this workaround:
   ```
   --- log_endpoint_py.default    2022-12-12 19:16:30.280526903 +0300
   +++ log_endpoint.py    2022-12-12 19:03:29.128970011 +0300
   @@ -33,7 +33,7 @@
    from airflow.utils.airflow_flask_app import get_airflow_app
    from airflow.utils.log.log_reader import TaskLogReader
    from airflow.utils.session import NEW_SESSION, provide_session
   -
   +from airflow.utils.state import State 
   
   @security.requires_access(
        [
   @@ -108,6 +108,8 @@
            token = URLSafeSerializer(key).dumps(metadata)  # type: 
ignore[assignment]
            return logs_schema.dump(LogResponseObject(continuation_token=token, 
content=logs))
        # text/plain. Stream
   -    logs = task_log_reader.read_log_stream(ti, task_try_number, metadata)
   -
   +    if ti.state not in State.running:
   +        logs = task_log_reader.read_log_stream(ti, task_try_number, 
metadata)
   +    else:
   +        logs = {"Task is still running, logs in grid view not available."}
        return Response(logs, headers={"Content-Type": return_type}) 
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to