GitHub user michaelrommel created a discussion: Huge memory consumption, when recv+streaming 1k chunks to S3
Hi, in [this](https://github.com/michaelrommel/leaktest.git) reproducible example, I have a small program that streams a file over TCP using tokio asyncio. As part of an investigation of a problem, I wanted to simulate high-latency networks with small packet sizes, so I chose to read only 1k chunks of data, send them over the wire and hand them off to an OpenDAL writer. It works, but for the life of me I cannot figure out, why the memory consumption is so high: instead of expected 10-12 MB the allocated memory is at peak close to 1.1GB of memory, a factor of 100x higher. I understand that in OpenDAL the incoming chunks of data are stored in a VecDeque queue until the minimum multipart threshold of 5MB from AWS S3 is reached, then a chunk is being sent of and so on. I tested this with an 18MB file and this is the allocation graph for different chunk sizes: <img width="2341" height="1369" alt="image" src="https://github.com/user-attachments/assets/aeb31616-bfb6-4028-bc5f-bb62aed90a76" /> The first lines 1k-1M are all sender.rs/receiver.rs test runs. Then I took out the network and just read the file in those small chunks and handed it over to OpenDAL. Everything is fine there. Also If I run that in a tokio background task, all is well. Can somebody point out that probably simple and stupid mistake I am making when those small packets are retrieved from the network? Is there some allocation from tokio or the network stack, that is still attached to the incoming buffer and not released until the references from the oio queue are dropped? I have now debugged this for days and cannot seem to find a good explanation, why this happens. Thanks in advance, Michael. GitHub link: https://github.com/apache/opendal/discussions/7200 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
