jason810496 opened a new pull request, #54552:
URL: https://github.com/apache/airflow/pull/54552

   related:
   - https://github.com/apache/airflow/pull/54445
   - https://github.com/apache/airflow/pull/49470
   
   ## Why
   
   The `get_log` API with the `application/nd-json` header should stream logs 
to the end, but the `FileTaskHandler` can only read content **that has already 
been flushed to the file and cannot access content that is still being 
written**.
   
   ## What
   
   There should be a polling mechanism to check for new changes to the file so 
that the API connection can remain open for streaming content to the frontend.
   
   - Add `stream_file_until_close` with `poll_interval = 0.1` and `idle_timeout 
= 10.0` seconds to keep streaming the file content until it remains unchanged 
for longer than the `idle_timeout` threshold.
   - Add time-based flushing for `_interleave_logs` so that logs are flushed 
based on a time interval, even if the heap size doesn't reach `HEAP_DUMP_SIZE`, 
to prevent frontend display delays.
   - Remove `log_pos` from log metadata.
   - Remove `LogStreamAccumulator` so that the API can raise log records as 
soon as possible.
     - Since `LogStreamAccumulator` flushes the log stream to a temp file to 
get the total log lines for the frontend, it requires waiting until the log 
stream ends before it can replay the log stream.
   
   ## Example Dag for Testing the `get_log` API Streaming Results
   
   ```python
   with DAG(
       dag_id="test_streaming_log",
       start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
       schedule=None,
   ):
   
       @task()
       def produce_slowly():
           import time
           print("Large task logs DAG starting")
   
           for _ in range(800):
               print(uuid4())
               time.sleep(1)
           
           print("Large task logs DAG done")
           return
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to