[
https://issues.apache.org/jira/browse/ARROW-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Cutler updated ARROW-11739:
---------------------------------
Description:
Following the discussion on https://github.com/apache/arrow/pull/9187.
Proposed API in BaseVariableWidthVector.java:
{code:java}
/**
* Get the potential buffer size for a particular number of records and
density.
* @param valueCount desired number of elements in the vector
* @param density average number of bytes per variable width element
* @return estimated size of underlying buffers if the vector holds
* a given number of elements
*/
public int getBufferSizeFor(final int valueCount, double density)
{code}
The current `getBufferSizeFor(int valueCount)` for BaseVariableWidthVector
requires that validity and offset vectors have already been allocated for at
least the given `valueCount`. If the aim of this method is to estimate memory
usage for a value count, it's not very useful because it can only give sizes
for less than or equal value counts in the currently allocated vector.
A better approach for approximating memory usage is to include a density
argument, along with value count. Then the buffer estimate does not require the
validity and offset vector to have any allocation. This also is inline with
`setInitialCapacity(int valueCount, double density)`
NOTE: this API should also be added to BaseLargeVariableWidthVector and
possibly BaseRepeatedValueVector(Large) as well.
was:
Following the discussion on https://github.com/apache/arrow/pull/9187.
The current `getBufferSize(int valueCount)` for BaseVariableWidthVector
requires that validity and offset vectors have already been allocated for at
least the given `valueCount`. If the aim of this method is to estimate the
> [Java] Add API for getBufferSizeFor() with density to BaseVariableWidthVector
> -----------------------------------------------------------------------------
>
> Key: ARROW-11739
> URL: https://issues.apache.org/jira/browse/ARROW-11739
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Reporter: Bryan Cutler
> Priority: Major
>
> Following the discussion on https://github.com/apache/arrow/pull/9187.
> Proposed API in BaseVariableWidthVector.java:
> {code:java}
> /**
> * Get the potential buffer size for a particular number of records and
> density.
> * @param valueCount desired number of elements in the vector
> * @param density average number of bytes per variable width element
> * @return estimated size of underlying buffers if the vector holds
> * a given number of elements
> */
> public int getBufferSizeFor(final int valueCount, double density)
> {code}
> The current `getBufferSizeFor(int valueCount)` for BaseVariableWidthVector
> requires that validity and offset vectors have already been allocated for at
> least the given `valueCount`. If the aim of this method is to estimate memory
> usage for a value count, it's not very useful because it can only give sizes
> for less than or equal value counts in the currently allocated vector.
> A better approach for approximating memory usage is to include a density
> argument, along with value count. Then the buffer estimate does not require
> the validity and offset vector to have any allocation. This also is inline
> with `setInitialCapacity(int valueCount, double density)`
> NOTE: this API should also be added to BaseLargeVariableWidthVector and
> possibly BaseRepeatedValueVector(Large) as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)