[GitHub] [airflow] akki commented on a change in pull request #9464: Fix DockerOperator xcom

GitBox Wed, 21 Oct 2020 01:23:39 -0700


akki commented on a change in pull request #9464:
URL: https://github.com/apache/airflow/pull/9464#discussion_r509083348




##########
File path: airflow/providers/docker/operators/docker.py
##########
@@ -256,29 +257,34 @@ def _run_image(self) -> Optional[str]:
 
             lines = self.cli.attach(container=self.container['Id'], 
stdout=True, stderr=True, stream=True)
 
-            self.cli.start(self.container['Id'])
+            def gen_output(stdout=False, stderr=False):
+                return (

Review comment:
       Hi
   
   I don't think using a generator instead of a list here solves the memory 
issue. The data will in the end be kept in memory - no matter you use a list or 
a generator.
   I did a small test to verify this; I ran the following code in a Python3 
terminal:
   ```
   >>> with open('yes.log', 'r') as file_:
   ...   x = (linee for linee in file_.read())
   ...
   ```
   where `yes.log` was a 650 MB file.
   As soon as the execution of this code-block completed, I saw the memory used 
by this process increase by 650 MB. It makes sense as well because where else 
would generators be storing these logs if not in memory.
   
   I am thinking that you might be able to achieve what you're trying to do by 
streaming the logs and using `yield`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] akki commented on a change in pull request #9464: Fix DockerOperator xcom

Reply via email to