Hi everyone,

I have a question about the Java implementation of Apache Arrow. Should we
always call setValueCount after creating a vector with allocateNew()?

I can see that in some tests where setValueCount is called immediately
after allocateNew. For example here:
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L285
,
but not in other tests:
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L792
.

To illustrate the problem more, if I change the isSet(int index)function as
follows:

public int isSet(int index) {
 if (valueCount == 0) {
 return 0;
 }
 final int byteIndex = index >> 3;
 final byte b = validityBuffer.getByte(byteIndex);
 final int bitIndex = index & 7;
 return (b >> bitIndex) & 0x01;
}

Many tests will fail, while logically they should not because if the
valueCount is 0 then isSet returned value for every index should be zero.
The problem comes from the allocateNew method which does not initialize the
valueCount variable.

One potential solution to this problem is to initialize the valueCount
in allocateNew function, as I did here:
https://github.com/azimafroozeh/arrow/commit/4281613b7ed1370252a155192f12b9bca494dbeb.
The classes BaseVariableWidthVector and BaseFixedWidthVector, both have
allocateNew function that needs to be changed. Is this an acceptable
approach? or am I missing some semantics?

Thanks,

Azim Afroozeh

Reply via email to