[
https://issues.apache.org/jira/browse/ARROW-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352519#comment-17352519
]
David Li commented on ARROW-12888:
----------------------------------
Also note that there's a few caveats. The Python/Cython method computes size by
summing up the size of the buffers, but this may be misleading when slices come
into play.
{noformat}
>>> import pyarrow as pa
>>> pa.record_batch([range(1024), range(1024)], names="ab")
pyarrow.RecordBatch
a: int64
b: int64
>>> batch = pa.record_batch([range(1024), range(1024)], names="ab")
>>> batch.nbytes
16384
>>> batch[512:].nbytes
16384 {noformat}
i.e. is size the size in memory (AIUI, that's what's desired here), or the
serialized size?
There is a method that'll compute the actual serialized size, but it works by
actually serializing the data, so it's slower:
{noformat}
>>> pa.ipc.get_record_batch_size(batch)
16576
>>> pa.ipc.get_record_batch_size(batch[512:])
8384
{noformat}
> Implement arrow::Table::GetSizeInBytes()
> ----------------------------------------
>
> Key: ARROW-12888
> URL: https://issues.apache.org/jira/browse/ARROW-12888
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Percy Camilo TriveƱo Aucahuasi
> Priority: Major
>
> Implement arrow::Table::GetSizeInBytes()
--
This message was sent by Atlassian Jira
(v8.3.4#803005)