GitHub user alexwilcoxson-rel closed a discussion: Memory usage during Parquet 
scan

Hi DataFusion community! I'm relatively new to Arrow, DataFusion, and Rust and 
had a question about how parquet records are streamed.

I have an Arrow Flight service in front of DataFusion. When scanning a 50GB 
table of ~1GB-256MB physical parquet files. I see memory around ~1-2GB. I get 
the sense as the arrow batches are yielded to the outbound flight stream the 
used memory is not dropped until a given file is complete?

Is this accurate?

I'm asking as we look to support many requests each holding onto a whole file 
requires a lot of memory. When we reduced the parquet file size to ~32MB each 
we could support a lot more requests, but was wondering if there was something 
datafusion related we could change in configuration or feature request to keep 
memory usage low when streaming larger files.

We are using version 19.

Thanks!

GitHub link: https://github.com/apache/datafusion/discussions/5901

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to