paul-rogers commented on PR #2943: URL: https://github.com/apache/drill/pull/2943#issuecomment-2362833617
@rymarm, thanks for finding this issue. I believe your fix is correct. The `prepareWrite()` call exists to fill any remaining bits with 0s (i.e. "fill empties."). The number to be filled is, indeed, `valueCount - 1`. So, I _think_ this is a good fix. I would feel more confident if we had a unit test. That this bug exists suggests that, despite the rather excessive number of tests I wrote way back when, I somehow omitted this case. Perhaps a reason is that the bit vector is special: it uses code different than the other vectors, and so needs its own tests. Tests for the fixed-width vectors are in `TestFixedWidthWriter` and `TestFillEmpties`. Perhaps you can whip up a `TestBitVector` that does a few tests, including for the special case you uncovered. In particular, it would be good to make sure that the following cases work: * Write some values to the bit vector to simulate the first few records setting the bits. * Claim that the batch has ended and the batch wrote a bunch of other records without writing to the bit vector. * Call `setValueCount()` with a number that is a multiple of 8, a multiple of 8 plus one, and a multiple of 8 minus one. (three cases) * Do this also so the value count/8 equals the initial allocation size, is below that size, or is above that size (three cases) This gives us 9 cases, some that will force a resize, some that should just fit in the initial allocation. Then, read the bits out of the vector, verifying that the "empty" bits are the `defaultValue` (the "default default" is 0). Why write tests? It is less embarrassing for a test to point out a problem than to find it in a production system. I apologize that I somehow missed doing the tests in the first place. The second concern is whether this same error was made for other writers. The answer seems to be "no": the other writers use a different structure for `setValueCount()`. As the comment in `BitColumnWriter` says, it is the only vector that defers to the mechanism in the vector's own mutator. The result is that the bit vector code differs from the other vectors which handle the entire write process themselves. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org