alamb opened a new issue, #5851:
URL: https://github.com/apache/arrow-rs/issues/5851

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When writing parquet files, depending on the writer settings and the data 
being written, we have observed the ArrowWriter consuming large amounts of 
memory (10s of GB) -- see https://github.com/apache/arrow-rs/issues/5828
   
   The memory usage of parquet writers also often comes up in the context of 
proposals for new parquet formats
   
   There is already a discussion about how to limit memory when writing here 
https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#memory-limiting
   
   However there is now way currently to get a measurement of actual current 
use (that we could use to abort the write, for example). 
   
   **Describe the solution you'd like**
   
   I would like some way to get  to have some visibility on the current memory 
usage of the internal buffering in the parquet writer
   
   
   **Describe alternatives you've considered**
   I propose adding a function to 
[ArrowWriter](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#)
 modeled on 
[Array::get_array_memory_size](https://docs.rs/arrow/latest/arrow/array/trait.Array.html#tymethod.get_array_memory_size)
   
   ```rust
   impl ArrayWriter {
     /// returns an estimate of how much memory the array
     /// writer is currently using in its internal buffers. 
     fn memory_size(&self) -> usize { ... }
   ...
   }
   ```
   
   **Additional context**
   Here is one ticket that describes one non trivial source of memory usage 
https://github.com/apache/arrow-rs/issues/5828 so the indices should be 
included. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to