paul-rogers commented on PR #2943:
URL: https://github.com/apache/drill/pull/2943#issuecomment-2362833617

   @rymarm, thanks for finding this issue. I believe your fix is correct. The 
`prepareWrite()` call exists to fill any remaining bits with 0s (i.e. "fill 
empties."). The number to be filled is, indeed, `valueCount - 1`. So, I _think_ 
this is a good fix.
   
   I would feel more confident if we had a unit test. That this bug exists 
suggests that, despite the rather excessive number of tests I wrote way back 
when, I somehow omitted this case. Perhaps a reason is that the bit vector is 
special: it uses code different than the other vectors, and so needs its own 
tests. Tests for the fixed-width vectors are in `TestFixedWidthWriter` and 
`TestFillEmpties`. Perhaps you can whip up a `TestBitVector` that does a few 
tests, including for the special case you uncovered. In particular, it would be 
good to make sure that the following cases work:
   
   * Write some values to the bit vector to simulate the first few records 
setting the bits.
   * Claim that the batch has ended and the batch wrote a bunch of other 
records without writing to the bit vector.
   * Call `setValueCount()` with a number that is a multiple of 8, a multiple 
of 8 plus one, and a multiple of 8 minus one. (three cases)
   * Do this also so the value count/8 equals the initial allocation size, is 
below that size, or is above that size (three cases)
   
   This gives us 9 cases, some that will force a resize, some that should just 
fit in the initial allocation.
   
   Then, read the bits out of the vector, verifying that the "empty" bits are 
the `defaultValue` (the "default default" is 0).
   
   Why write tests? It is less embarrassing for a test to point out a problem 
than to find it in a production system. I apologize that I somehow missed doing 
the tests in the first place.
   
   The second concern is whether this same error was made for other writers. 
The answer seems to be "no": the other writers use a different structure for 
`setValueCount()`. As the comment in `BitColumnWriter` says, it is the only 
vector that defers to the mechanism in the vector's own mutator. The result is 
that the bit vector code differs from the other vectors which handle the entire 
write process themselves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to