uranusjr commented on a change in pull request #19027:
URL: https://github.com/apache/airflow/pull/19027#discussion_r730539747



##########
File path: airflow/providers/docker/operators/docker.py
##########
@@ -304,21 +304,24 @@ def _run_image_with_mounts(self, target_mounts, 
add_tmp_variable: bool) -> Optio
             working_dir=self.working_dir,
             tty=self.tty,
         )
-        lines = self.cli.attach(container=self.container['Id'], stdout=True, 
stderr=True, stream=True)
+        logstream = self.cli.attach(container=self.container['Id'], 
stdout=True, stderr=True, stream=True)
         try:
             self.cli.start(self.container['Id'])
 
-            line = ''
+            log_chunk = ''
             res_lines = []
             return_value = None
-            for line in lines:
-                if hasattr(line, 'decode'):
-                    # Note that lines returned can also be byte sequences so 
we have to handle decode here
-                    line = line.decode('utf-8')
-                line = line.strip()
-                res_lines.append(line)
-                self.log.info(line)
+            for log_chunk in logstream:
+                if hasattr(log_chunk, 'decode'):
+                    # Note that log_chunk returned can also be byte sequences 
so we have to handle decode here
+                    log_chunk = log_chunk.decode('utf-8')
+                log_chunk = log_chunk.strip()
+                res_lines.append(log_chunk)
+                self.log.info(log_chunk)
             result = self.cli.wait(self.container['Id'])
+            # after container has exited, grab the entire log ignoring the 
chunked log stream that was used with attach
+            # self.cli.logs uses docker's /containers/{id}/logs, while 
self.cli.attach uses /containers/{id}/attach
+            lines = self.cli.logs(container=self.container['Id'], stdout=True, 
stderr=True, stream=True)

Review comment:
       I feel we should ultilise the `tail` argument to fetch less data when 
only the last line if needed. Or even not do this additional call at all if 
`retrieve_output` is set and the container exited successfully.
   
   The `_get_return_value_from_logs` can be removed entirely (it is only used 
once here), and it's "private" so we are allowed to break it (or we can always 
bump the major version of the Docker provider if needed).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to