On Tue, 31 Aug 2021 21:46:23 -0700 Rares Vernica <rvern...@gmail.com> wrote: > > I'm storing RecordBatch objects in a local cache to improve performance. I > want to keep track of the memory usage to stay within bounds. The arrays > stored in the batch are not nested. > > The best way I came up to compute the size of a RecordBatch is: > > size_t arrowSize = 0; > for (auto i = 0; i < arrowBatch->num_columns(); ++i) { > auto column = arrowBatch->column_data(i); > if (column->buffers[0]) > arrowSize += column->buffers[0]->size(); > if (column->buffers[1]) > arrowSize += column->buffers[1]->size(); > } > > Does this look reasonable? I guess we are over estimating a bit due to the > buffer alignment but that should be fine.
Probably, but you should iterate over all buffers instead of selecting just buffers 0 and 1 (what if you have a string column?). So basically: ``` size_t arrowSize = 0; for (const auto& column : batch->columns()) { for (const auto& buffer : column->data()->buffers) { if (buffer) arrowSize += buffer->size(); } } ``` Regards Antoine.