potiuk commented on code in PR #28818:
URL: https://github.com/apache/airflow/pull/28818#discussion_r1065637360


##########
airflow/utils/log/log_reader.py:
##########
@@ -77,12 +78,16 @@ def read_log_stream(self, ti: TaskInstance, try_number: int 
| None, metadata: di
             metadata.pop("max_offset", None)
             metadata.pop("offset", None)
             metadata.pop("log_pos", None)
-            while "end_of_log" not in metadata or (
-                not metadata["end_of_log"] and ti.state not in [State.RUNNING, 
State.DEFERRED]
-            ):
+            while True:
                 logs, metadata = self.read_log_chunks(ti, current_try_number, 
metadata)
                 for host, log in logs[0]:
                     yield "\n".join([host or "", log]) + "\n"
+                if "end_of_log" not in metadata or (
+                    not metadata["end_of_log"] and ti.state not in 
[State.RUNNING, State.DEFERRED]
+                ):
+                    time.sleep(0.5)

Review Comment:
   Well. I am not complaining about the value, but about it being "magic".
   
   I find such magic numbers often necessary for tests but in "production" 
code, I think they always have to have at least some justification written down.
   
   If at any point I have a magic number like that and I have no good 
explanation why this is the value chosen here It is always a red flag. Imagine 
someone else coming here a year from now trying to solve an issue with log 
streaming and askiong a question - why do we have 0.5 here ?. 
   
   The answer should be at least documented somewhere with reasoning - because 
code does not explain it.
   
   Of coudse iddeally this should be an async wait for the condition to get it 
out. But I understand it is  not possible. So having some "default" value here 
might make sense. The tradeoff is - as I understand is that we will get logs 
statistically with 0.25 sec delay but then the cpu usage will be much lower. 
   
   But there are at least few questions. How much is ok? What's the acceptable  
delay?  Should it be different by default for different handlers? Is it needed 
for those other handlers as well? Or maybe we should add a possibility of 
specifying it differently for each handler?
   
   Maybe answer to all those question is (after applying a common-sense 
judgment):
   
   > The 0.5 s delay here is ok because it causes average 250ms delay on 
streaming the logs. but it lowers the CPU usage. While we could make it 
handler-dependent, we decided to hard-code it as 0.5s for all handlers for 
simplicity
   
   But then your judgment now might be different in the future - so for the 
future-self and future-others having this commented in the code would be the 
right thing to do I think, this way the person in the future (either yourself 
or someone else) will not loose mental capacity on finding out why 0.5 was 
chosen.
   
   Just this :).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to