akki commented on a change in pull request #9464:
URL: https://github.com/apache/airflow/pull/9464#discussion_r509083348
##########
File path: airflow/providers/docker/operators/docker.py
##########
@@ -256,29 +257,34 @@ def _run_image(self) -> Optional[str]:
lines = self.cli.attach(container=self.container['Id'],
stdout=True, stderr=True, stream=True)
- self.cli.start(self.container['Id'])
+ def gen_output(stdout=False, stderr=False):
+ return (
Review comment:
Hi
I don't think using a generator instead of a list here solves the memory
issue. The data will in the end be kept in memory - no matter you use a list or
a generator.
I did a small test to verify this; I ran the following code in a Python3
terminal:
```
>>> with open('yes.log', 'r') as file_:
... x = (linee for linee in file_.read())
...
```
where `yes.log` was a 650 MB file.
As soon as the execution of this code-block completed, I saw the memory used
by this process increase by 650 MB. It makes sense as well because where else
would generators be storing these logs if not in memory.
I am thinking that you might be able to achieve what you're trying to do by
streaming the logs and using `yield`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]