WeichenXu123 edited a comment on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-765076360
@liyafan82 > The reason is that, for variable width vectors, it is not possible to estimate the buffer size without actually filling up the vector. Why not possible ? For variableWidthVector, The buffer layout is quite simple: 1) validity buffer: nbytes = ceil(N/8) 2) offset buffer: nbytes = OFFSET_WIDTH * (N + 1) 3) value buffer: nbytes = sum of all non-null value bytes count For `variableWidthVector`, the `setValueCount` method only do additional things of: 1) fill holes (fill the offset buffer gap for last consecutive NULL values), it won't add any data into "value buffer". 2) reset these buffer reader/writer index (this looks like has some side effect) But, without calling `setValueCount`, it doesn't affect we compute the "vector buffer size" Does it make sense ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
