ashb commented on issue #46657:
URL: https://github.com/apache/airflow/issues/46657#issuecomment-2668037123

   Oh yeah, partial requests/range requests/ smaller number of lines etc needs 
a more in-depth overhaul of the entire log reading pipeline (which I think we 
should do)
   
   It currently does a lot of things
   - It reads form multiple sources
   - It then parses each line to try and get the timestamp/date from it
   - It sorts it
   - Then due to what I think is another bug[1] it then dedupes the lines!
   
   
   [1] the "bug" is that it looks at a local file, and then it also asks the 
remote log URL. So in the case of LocalExecutor in breeze/a single airflow pod, 
the webserver can read the file and it gets one set of logs there, and then it 
also hits the log http server and gets _the same content_ there. Deduping is a 
bit of hammer to fix this problem that we should come up with a more elegant 
solution around. But that falls into a wider "lets look at logging" issue than 
this is talking about


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to