ashb commented on issue #46657: URL: https://github.com/apache/airflow/issues/46657#issuecomment-2668037123
Oh yeah, partial requests/range requests/ smaller number of lines etc needs a more in-depth overhaul of the entire log reading pipeline (which I think we should do) It currently does a lot of things - It reads form multiple sources - It then parses each line to try and get the timestamp/date from it - It sorts it - Then due to what I think is another bug[1] it then dedupes the lines! [1] the "bug" is that it looks at a local file, and then it also asks the remote log URL. So in the case of LocalExecutor in breeze/a single airflow pod, the webserver can read the file and it gets one set of logs there, and then it also hits the log http server and gets _the same content_ there. Deduping is a bit of hammer to fix this problem that we should come up with a more elegant solution around. But that falls into a wider "lets look at logging" issue than this is talking about -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
