[ 
https://issues.apache.org/jira/browse/ARROW-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352519#comment-17352519
 ] 

David Li commented on ARROW-12888:
----------------------------------

Also note that there's a few caveats. The Python/Cython method computes size by 
summing up the size of the buffers, but this may be misleading when slices come 
into play.
{noformat}
>>> import pyarrow as pa
>>> pa.record_batch([range(1024), range(1024)], names="ab")
pyarrow.RecordBatch
a: int64
b: int64
>>> batch = pa.record_batch([range(1024), range(1024)], names="ab")
>>> batch.nbytes
16384
>>> batch[512:].nbytes
16384 {noformat}
i.e. is size the size in memory (AIUI, that's what's desired here), or the 
serialized size?

There is a method that'll compute the actual serialized size, but it works by 
actually serializing the data, so it's slower:
{noformat}
>>> pa.ipc.get_record_batch_size(batch)
16576
>>> pa.ipc.get_record_batch_size(batch[512:])
8384
 {noformat}

> Implement arrow::Table::GetSizeInBytes()
> ----------------------------------------
>
>                 Key: ARROW-12888
>                 URL: https://issues.apache.org/jira/browse/ARROW-12888
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Percy Camilo TriveƱo Aucahuasi
>            Priority: Major
>
> Implement arrow::Table::GetSizeInBytes()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to