BryanCutler commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-766152565
I think the intention of `getBufferSizeFor(final int valueCount)` is to provide an estimated buffer size of the vector and it doesn't make sense that the vector should have to be in a certain kind of state to get that estimate. And even calling `setValueCount()` doesn't provide a good estimate since that will just fill empty data. Since this is a variable width vector, it also doesn't make sense to try to get that estimate from a `valueCount` alone. A better way to get an estimate of buffer size would be to include a `density` value for the avg number of bytes per record, similar to `setInitialCapacity(int valueCount, double density)`. You could then get the density from a previous vector and use that to estimate the size for the next vector: ```java int batch_size = 123; double prev_density = prev_vector.getDensity(); int estimated_size = new_vector.getBufferSizeFor(batch_size, prev_density); ``` This new `getBufferSizeFor()` does not need to be in any kind of state, and `setValueCount()` would not need to be called before hand. What are your guys thoughts on this, and does that work for your use case @WeichenXu123 ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
