WeichenXu123 edited a comment on pull request #9187:
URL: https://github.com/apache/arrow/pull/9187#issuecomment-765076360


   @liyafan82 
   
   > The reason is that, for variable width vectors, it is not possible to 
estimate the buffer size without actually filling up the vector. 
   
   Why not possible ? For variableWidthVector, The buffer format is quite 
simple:
   1) validity buffer: nbytes = ceil(N/8) 
   2) offset buffer: nbytes = OFFSET_WIDTH * (N + 1)
   3) value buffer: nbytes = sum of all non-null value bytes count
   
   For `variableWidthVector`, the `setValueCount` method only do additional 
things of:
   1) fill holes (fill the offset buffer gap for last consecutive NULL values), 
it won't add any data into "value buffer".
   2) reset offset reader/writer index (this looks like has some side effect)
   
   But, without calling `setValueCount`, it doesn't affect we compute the 
"vector buffer size"
   
   Does it make sense ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to