Thanks for your answer. So the valueCount shows the number of data filled
in the vector.

Then I would like to ask you why the valueCount after setting some values
is 0? for example: (
https://github.com/apache/arrow/blob/3fbbcdaf77a9e354b6bd07ec1fd1dac005a505c9/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L609
)


System.out.print(vector.getValueCount()); //prints 0
/* populate the vector */vector.set(0, 100.5f);vector.set(2,
201.5f);vector.set(4, 300.3f);vector.set(6, 423.8f);vector.set(8,
555.6f);vector.set(10, 66.6f);vector.set(12, 78.8f);vector.set(14,
89.5f);
System.out.print(vector.getValueCount()); //prints 0


If I add these two print lines, they will print 0.


Also If I add the following code to isSet again some tests fail.

 if (valueCount == getValueCapacity()) {      return 1;    }



Thanks,


Azim Afroozeh

On Fri, Nov 8, 2019 at 10:57 AM Fan Liya <liya.fa...@gmail.com> wrote:

> Hi Azim,
>
> I think we should be aware of two distinct concepts:
>
> 1. vector capacity: the max number of values that can be stored in the
> vector, without reallocation
> 2. vector length: the number of values actually filled in the vector
>
> For any valid vector, we always have vector length <= vector capacity.
>
> The allocateNew method expands the vector capacity, but it does not fill in
> any value, so it does not affect the the vector length.
>
> For the code above, if the vector length is 0, the value of isSet(index)
> (where index > 0) should be undefined. So throwing an exception is the
> correct behavior.
>
> Hope this answers your question.
>
> Best,
> Liya Fan
>
>
> On Fri, Nov 8, 2019 at 5:38 PM azim afroozeh <afrooz...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I have a question about the Java implementation of Apache Arrow. Should
> we
> > always call setValueCount after creating a vector with allocateNew()?
> >
> > I can see that in some tests where setValueCount is called immediately
> > after allocateNew. For example here:
> >
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L285
> > ,
> > but not in other tests:
> >
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L792
> > .
> >
> > To illustrate the problem more, if I change the isSet(int index)function
> as
> > follows:
> >
> > public int isSet(int index) {
> >  if (valueCount == 0) {
> >  return 0;
> >  }
> >  final int byteIndex = index >> 3;
> >  final byte b = validityBuffer.getByte(byteIndex);
> >  final int bitIndex = index & 7;
> >  return (b >> bitIndex) & 0x01;
> > }
> >
> > Many tests will fail, while logically they should not because if the
> > valueCount is 0 then isSet returned value for every index should be zero.
> > The problem comes from the allocateNew method which does not initialize
> the
> > valueCount variable.
> >
> > One potential solution to this problem is to initialize the valueCount
> > in allocateNew function, as I did here:
> >
> >
> https://github.com/azimafroozeh/arrow/commit/4281613b7ed1370252a155192f12b9bca494dbeb
> > .
> > The classes BaseVariableWidthVector and BaseFixedWidthVector, both have
> > allocateNew function that needs to be changed. Is this an acceptable
> > approach? or am I missing some semantics?
> >
> > Thanks,
> >
> > Azim Afroozeh
> >
>

Reply via email to