potiuk edited a comment on issue #20500:
URL: https://github.com/apache/airflow/issues/20500#issuecomment-1001709411


   >   This will change the growth of buffer from exponential to linear, and 
the users may hit the recursive stack limit before the out-of-memory error, and 
would more easily figure out the issue.
   
   Unfortunately this would change the logging behaviour. which we do not want 
to do as it would affect legitimately written logs and use extra memory when 
not needed. What we do with log writtng is correct - we should not use extra 
buffer nor strip messages. The problems is that by the time  you write log to 
stderr, stderr is directly routed back to the stream writer that propagates the 
logs further. It's not intended to redirect to itself. Yet by adding 
StreamHandler it happens. 
   
   It happens extremely rarely, I think it makes little sense to make any 
workarounds  without revamping the logging infrastructure - it would be a 
band-aid and not worth the effort. I think. But If you find a solution that 
will prevent this situation from happening - for example failing if the 
redirection is detected. without unnecessary performance overhead on existing 
logging, then PRs are most welcome.
   
   > * There are cases where users don't have much control over the code 
(tasks) run in airflow.  It's also quite possible some weird import statement 
pulls in the unwanted redirection.  If airflow could detect such misuse, or 
even better, could separate its own log from users (sorry, I don't know how), 
that would be much nicer.
   
   Surely. It can happen and it is impossible to detect unless you know exactly 
what to look for (for example we have protection agains infinite recursion in 
secret masker).  But there we know exactly what to do. If the user does 
something wrong and perfoms (jsut tried it myself):
   
   ```
   def task(s):
         task(s + '   '*len(s))
   
   task(' ')
   
   ```
   
   there is not much one can do. This is Python. You can do anything. 
   
   > Anyway, just hope this would save others some time when they deploy the 
new versions.
   
   I hope so too. This is one of the legitmate uses f github issues and 
discussions. They are searchable and if somoene has similar issue, I hope they 
will find it. That might be actually the cheapest and most effective solution 
overall.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to