potiuk commented on PR #35677: URL: https://github.com/apache/airflow/pull/35677#issuecomment-1839681016
> Well this issue has been open for quite some time now. > > The alternative would be to use follow=True. This will stop and hang the thread by definition. You'd need a parallel thread to check if the service has finished, as this always hangs no matter what the state of the service (Running, Started, Waiting, Done) . In the end you don't avoid polling. And you have to introduce additional code with parallelism which in Python isn't great, at least to my knowledge. We are already doing it for other operators - K8S and no - you do not have to poll Airflow API. In many cases - when remote logging is involved, loggers are just logging to a remote loogging service (cloudwatch for example) which takes care about streaming logs to UI for example - se yes, you can absolutely avoid polling. You could likely use this: ``` since (datetime, int, or float) – Show logs since a given datetime, integer epoch (in seconds) or float (in nanoseconds) ``` And ask the logs to include timestamps and use them. Especially if you use float (nanoseconds) you could record the last log nanosecond + maybe store last few lines and add a few nanoseconds of overlap and de-duplicate the overlapping lines. This is very similar to your proposal but will avoid potentially huge, increasing traffic and potentially huge memory used to keep the logs in memory. The way you implemented it, might cause MB and GB of memory wasted to keep whole log in-memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
