schattian commented on PR #23658:
URL: https://github.com/apache/airflow/pull/23658#issuecomment-1157407782

   >one alternative i'll just mention is we could handle this by doing our own 
"following" instead of relying on the stream. i.e. we do our own loop and 
consume the latest logs, then sleep a second and do the same.
   
   you mean calling X amount of times the kubectl logs api? I think that's a 
worse than a follow.. 
   Mostly because you'll have some lines printed twice (and could be expensive 
for some dags)
   
   >one is I think you need to kill the logging process when on_kill is called.
   
   Sorry, i dont think i understand what's the point here
   
   >the other is, if you get an error in await_container_completion, your 
subprocess won't be killed. so, you probably should modify so that if you get 
an unexpected error after process has started, that the process will be killed.
   
   Hmm but in that case, aren't we going to propagate a SIGTERM? (ie killing 
the subprocess implicitly)?
   I was just relying on that, but if that's not how it works / could change, i 
will modify that.
   
   >also, i'd make "private" the two methods you've added
   👍🏻 
   
   >separate question for you @schattian. suppose the stream hangs, and the pod 
is still running. if we terminate the process and restart logs consumption, 
will it get logs from the new location? or will it still look at the original 
location and therefore any new logging is unreachable and it will immediately 
hang?
   
   That is correct. A call to kubectl logs api after the stream hanged will 
return the newest logs.
   But, there are a few problems with that:
   
   1. The pod could be deleted, and in that case you'll receive no logs.
   2. We determine that the stream hangs as "if the container is not running 
and stream consumption didnt finish", as there is not a better way to do it 
(please let me know if you have one). In that case
   
   I have mentioned that in the first pr 
https://github.com/apache/airflow/pull/23618#discussion_r869096030
   I think it adds some complexity that's not worth as this bugs doesn't occur 
so often
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to