tustvold commented on issue #6839:
URL: https://github.com/apache/arrow-rs/issues/6839#issuecomment-2521731031
The issue is that GenericColumnWriter::memory_size is not accounting for the
data_pages it has buffered waiting for the dictionary page to be flushed. This
should be a relatively straightforward case of changing it to be
```
pub(crate) fn memory_size(&self) -> usize {
self.data_pages.iter().map(|x| x.data().len()).sum::<usize>()
+ self.column_metrics.total_bytes_written as usize
+ self.encoder.estimated_memory_size()
}
```
And adding an appropriate test
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]