Thanks for your answer. So the valueCount shows the number of data filled in the vector.
Then I would like to ask you why the valueCount after setting some values is 0? for example: ( https://github.com/apache/arrow/blob/3fbbcdaf77a9e354b6bd07ec1fd1dac005a505c9/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L609 ) System.out.print(vector.getValueCount()); //prints 0 /* populate the vector */vector.set(0, 100.5f);vector.set(2, 201.5f);vector.set(4, 300.3f);vector.set(6, 423.8f);vector.set(8, 555.6f);vector.set(10, 66.6f);vector.set(12, 78.8f);vector.set(14, 89.5f); System.out.print(vector.getValueCount()); //prints 0 If I add these two print lines, they will print 0. Also If I add the following code to isSet again some tests fail. if (valueCount == getValueCapacity()) { return 1; } Thanks, Azim Afroozeh On Fri, Nov 8, 2019 at 10:57 AM Fan Liya <liya.fa...@gmail.com> wrote: > Hi Azim, > > I think we should be aware of two distinct concepts: > > 1. vector capacity: the max number of values that can be stored in the > vector, without reallocation > 2. vector length: the number of values actually filled in the vector > > For any valid vector, we always have vector length <= vector capacity. > > The allocateNew method expands the vector capacity, but it does not fill in > any value, so it does not affect the the vector length. > > For the code above, if the vector length is 0, the value of isSet(index) > (where index > 0) should be undefined. So throwing an exception is the > correct behavior. > > Hope this answers your question. > > Best, > Liya Fan > > > On Fri, Nov 8, 2019 at 5:38 PM azim afroozeh <afrooz...@gmail.com> wrote: > > > Hi everyone, > > > > I have a question about the Java implementation of Apache Arrow. Should > we > > always call setValueCount after creating a vector with allocateNew()? > > > > I can see that in some tests where setValueCount is called immediately > > after allocateNew. For example here: > > > > > https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L285 > > , > > but not in other tests: > > > > > https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L792 > > . > > > > To illustrate the problem more, if I change the isSet(int index)function > as > > follows: > > > > public int isSet(int index) { > > if (valueCount == 0) { > > return 0; > > } > > final int byteIndex = index >> 3; > > final byte b = validityBuffer.getByte(byteIndex); > > final int bitIndex = index & 7; > > return (b >> bitIndex) & 0x01; > > } > > > > Many tests will fail, while logically they should not because if the > > valueCount is 0 then isSet returned value for every index should be zero. > > The problem comes from the allocateNew method which does not initialize > the > > valueCount variable. > > > > One potential solution to this problem is to initialize the valueCount > > in allocateNew function, as I did here: > > > > > https://github.com/azimafroozeh/arrow/commit/4281613b7ed1370252a155192f12b9bca494dbeb > > . > > The classes BaseVariableWidthVector and BaseFixedWidthVector, both have > > allocateNew function that needs to be changed. Is this an acceptable > > approach? or am I missing some semantics? > > > > Thanks, > > > > Azim Afroozeh > > >