asaf400 opened a new pull request #19030: URL: https://github.com/apache/airflow/pull/19030
Hello, I would like some opinions on this PR, it has no open issue, but it's something I have encountered and fix locally in Airflow 2.0.2, and the code hasn't change in main branch. Please note this PR has two main changes: - adding a missing feature, aligning functionality with similar Operator - refactoring for efficiency and responsivness sake 1. DockerOperator has** the option to send the entire container log into xcom return_value, not just the last line EcsOperator doesn't have this option, it defaults to only sending the last line, This commit changes that behaviour, adding xcom_all parameter to the Operator, to enable sending all logs lines. 2. In addition, the get_last_log_message function has been refactored, In an edge case where the log is bigger than 1 MB (up to 10,000 log events) as per aws docs, it is possible that the function would cause more than one 'GetLogEvents' AWS API call because it defaults to read the logstream/group from beginning, and using deque with queue size of 1, it would iterate over the entire generator yields, calling get_log_events as much as it needs, just so that when the generator is exhausted, the queue would keep only the last event, this is a waste of time and data transfer, AWS get_log_events method has start_from_head parameter, which can be used to actually tail the event log instead of reading from head! in combination with a single next() yield, we can ensure we only call get_log_events once, and within the response, grab the last message ** DockerOperator is actually also unstable in regard to xcoms, see issue #18874 and PR #19027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
