[ 
https://issues.apache.org/jira/browse/ARROW-15065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462464#comment-17462464
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15065:
--------------------------------------------------

[~jorisvandenbossche] and [~westonpace] 

I looked into the code to do this modification. Here I have a few questions 
related to the functions that need to be exposed to Python. 

As far as I understand, the following methods are not exposed to Python yet. 
Please correct me if I am wrong. 

```c++
/// Dictionary arrays will always be counted in their entirety
/// even if the array only references a portion of the dictionary.
Result<int64_t> ARROW_EXPORT ReferencedBufferSize(const ArrayData& array_data);
/// \brief Returns the sum of bytes from all buffer ranges referenced
/// \see ReferencedBufferSize(const ArrayData& array_data) for details
Result<int64_t> ARROW_EXPORT ReferencedBufferSize(const Array& array_data);
/// \brief Returns the sum of bytes from all buffer ranges referenced
/// \see ReferencedBufferSize(const ArrayData& array_data) for details
Result<int64_t> ARROW_EXPORT ReferencedBufferSize(const ChunkedArray& 
array_data);
/// \brief Returns the sum of bytes from all buffer ranges referenced
/// \see ReferencedBufferSize(const ArrayData& array_data) for details
Result<int64_t> ARROW_EXPORT ReferencedBufferSize(const RecordBatch& 
array_data);
/// \brief Returns the sum of bytes from all buffer ranges referenced
/// \see ReferencedBufferSize(const ArrayData& array_data) for details
Result<int64_t> ARROW_EXPORT ReferencedBufferSize(const Table& array_data);
```

Here "arrow::util::ReferencedBufferSize" methods needs to be included in Cython 
bindings. Not quite sure what is the best place to put these into. None of the 
members in this header `arrow/util/byte_size.h` is included in Cython. *What 
should be a better place to put these methods?*

Secondly, each entity represented by these methods
 * ArrayData
 * Array
 * ChunkedArray
 * RecordBatch
 * Table

Need a method called `get_buffer_size` or a property `buffer_size` in each API. 
Since we only focus on the actual data buffer, the method name could be vital 
for user to understand it clearly. 

Suggestions: `data_buffer_size`, `buffer_size` 

> [Python][R] Expose ReferencedBufferSize to python/R
> ---------------------------------------------------
>
>                 Key: ARROW-15065
>                 URL: https://issues.apache.org/jira/browse/ARROW-15065
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python, R
>            Reporter: Weston Pace
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>              Labels: good-first-issue
>
> This could be a method on arrays, chunked arrays, record batches, and tables. 
>  This method takes array offsets into account.
> We should probably add this alongside the existing nbytes field with clear 
> commenting about the difference between the two of them.  Both can be useful 
> depending on the need.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to