potiuk commented on PR #35677:
URL: https://github.com/apache/airflow/pull/35677#issuecomment-1839681016

   > Well this issue has been open for quite some time now.
   > 
   > The alternative would be to use follow=True. This will stop and hang the 
thread by definition. You'd need a parallel thread to check if the service has 
finished, as this always hangs no matter what the state of the service 
(Running, Started, Waiting, Done) . In the end you don't avoid polling. And you 
have to introduce additional code with parallelism which in Python isn't great, 
at least to my knowledge.
   
   We are already doing it for other operators - K8S and  no - you do not have 
to poll Airflow API. In many cases - when remote logging is involved, loggers 
are just logging to a remote loogging service (cloudwatch for example) which 
takes care about streaming logs to UI for example - se yes, you can absolutely 
avoid polling. 
   
   You could likely use this: 
   
   ```
   since (datetime, int, or float) – Show logs since a given datetime, integer 
epoch (in seconds) or float (in nanoseconds)
   ```
   
   And ask the logs to include timestamps and use them.
   
   Especially if you use float (nanoseconds) you could record the last log 
nanosecond  + maybe store last few lines and add a few nanoseconds of overlap 
and de-duplicate the overlapping lines. 
   
   This is very similar to your proposal but will avoid potentially huge, 
increasing traffic and potentially huge memory used to keep the logs in memory. 
The way you implemented it, might cause MB and GB of memory wasted to keep 
whole log in-memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to