GitHub user michaelrommel created a discussion: Huge memory consumption, when 
recv+streaming 1k chunks to S3

Hi,

in [this](https://github.com/michaelrommel/leaktest.git) reproducible example, 
I have a small program that streams a file over TCP using tokio asyncio. As 
part of an investigation of a problem, I wanted to simulate high-latency 
networks with small packet sizes, so I chose to read only 1k chunks of data, 
send them over the wire and hand them off to an OpenDAL writer.

It works, but for the life of me I cannot figure out, why the memory 
consumption is so high: instead of expected 10-12 MB the allocated memory is at 
peak close to 1.1GB of memory, a factor of 100x higher.

I understand that in OpenDAL the incoming chunks of data are stored in a 
VecDeque queue until the minimum multipart threshold of 5MB from AWS S3 is 
reached, then a chunk is being sent of and so on. I tested this with an 18MB 
file and this is the allocation graph for different chunk sizes:

<img width="2341" height="1369" alt="image" 
src="https://github.com/user-attachments/assets/aeb31616-bfb6-4028-bc5f-bb62aed90a76";
 />

The first lines 1k-1M are all sender.rs/receiver.rs test runs. Then I took out 
the network and just read the file in those small chunks and handed it over to 
OpenDAL. Everything is fine there. Also If I run that in a tokio background 
task, all is well.

Can somebody point out that probably simple and stupid mistake I am making when 
those small packets are retrieved from the network? Is there some allocation 
from tokio or the network stack, that is still attached to the incoming buffer 
and not released until the references from the oio queue are dropped?

I have now debugged this for days and cannot seem to find a good explanation, 
why this happens.

Thanks in advance,

  Michael.

GitHub link: https://github.com/apache/opendal/discussions/7200

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to