On Tue, 31 Aug 2021 21:46:23 -0700
Rares Vernica <[email protected]> wrote:
>
> I'm storing RecordBatch objects in a local cache to improve performance. I
> want to keep track of the memory usage to stay within bounds. The arrays
> stored in the batch are not nested.
>
> The best way I came up to compute the size of a RecordBatch is:
>
> size_t arrowSize = 0;
> for (auto i = 0; i < arrowBatch->num_columns(); ++i) {
> auto column = arrowBatch->column_data(i);
> if (column->buffers[0])
> arrowSize += column->buffers[0]->size();
> if (column->buffers[1])
> arrowSize += column->buffers[1]->size();
> }
>
> Does this look reasonable? I guess we are over estimating a bit due to the
> buffer alignment but that should be fine.
Probably, but you should iterate over all buffers instead of
selecting just buffers 0 and 1 (what if you have a string column?).
So basically:
```
size_t arrowSize = 0;
for (const auto& column : batch->columns()) {
for (const auto& buffer : column->data()->buffers) {
if (buffer)
arrowSize += buffer->size();
}
}
```
Regards
Antoine.