Kontinuation opened a new pull request, #515:
URL: https://github.com/apache/sedona-db/pull/515

   This patch improves the accuracy of memory usage estimation by implementing 
our own functions for estimating the in-memory sizes of record batches and 
arrow arrays.
   
   The rationale is similar to https://github.com/apache/datafusion/pull/13377. 
If we don't roll our own memory usage estimation function but call 
`RecordBatch::get_array_memory_size` instead, we'll get insanely inaccurate 
numbers for spilled batches read using `arrow::ipc::reader::StreamReader`.
   
   Future work: use the memory pool API of arrow-rs for more accurate memory 
usage accounting. See https://github.com/apache/arrow-rs/issues/8137.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to