Slight correction on code:

int recordIndexToRead = ...
ListVector lv = ...
ArrowBuf offsetVector = lv.getOffsetBuffer();
VarCharVector vc = lv.getDataVector();
int listStart = offsetVector.getInt((recordIndexToRead ) * 4) ;
int listEnd = offsetVector.getInt((recordIndexToRead + 1) * 4);
NullableVarCharHolder nvh = new NullableVarCharHolder();
for(int i = listStart; i < listEnd; i++){
  vc.get(i, nvh);
  // do something with data.
}

On Fri, Aug 31, 2018 at 9:04 AM Jacques Nadeau <[email protected]> wrote:

> Adding the Arrow dev list.
>
> Yes, VarCharVector.get(int index, NullableVarCharHolder holder) is a
> cheaper method.
>
> You can get the offsets from list vector and then using the holder to
> retrieve pointers into the exist memory. That memory is offheap so you'll
> have to do a copy if you want a byte array.
>
> Pseudo code:
>
> int recordIndexToRead = ...
> ListVector lv = ...
> ArrowBuf offsetVector = lv.getOffsetBuffer();
> VarCharVector vc = lv.getDataVector();
> int listStart = lv.offsetBuffer.getInt((recordIndexToRead ) * 4) ;
> int listEnd = lv.offsetBuffer.getInt((recordIndexToRead + 1) * 4);
> NullableVarCharHolder nvh = new NullableVarCharHolder();
> for(int i = listStart; i < listEnd; i++){
>   vc.get(i, nvh);
>   // do something with data.
> }
>
>
>
>
>
>
> On Fri, Aug 31, 2018 at 2:08 AM Xu,Wenjian <[email protected]> wrote:
>
>> Hi Jacques,
>>
>> I have a question about ListVector in Arrow Java API. Thanks for your
>> kind help.
>>
>> I would like to iterate through *array<string>* in SQL semantics.
>>
>> I understand that , in order to represent *array<string>* in Arrow
>> format, I could use ListVector with VarCharVector as the inner list. My
>> question is, how to efficiently access all the elements (i.e., each byte[]
>> as string)?
>>
>> By checking the test code:
>>
>> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java
>>
>> one option is to use ListVector.getObject(int index) to get each
>> ArrayList<Text>, and then access each element in ArrayList<Text>. But this
>> method is expensive because:
>>
>> 1) it calls VarCharVector.get(int index) which involves memory copy
>> 2) it calls Text.set(byte[]) which assemble the Text from byte array.
>>
>> My goal is just to retrieve each byte[] and do some filtering. Is there
>> any other less expensive method to achieve my goal? For example,
>> VarCharVector.get(int index, NullableVarCharHolder holder) seems to be a
>> less-expensive operation. But how to use this method in my case?
>>
>> Thanks again.
>>
>> Best regards,
>> Wenjian
>>
>

Reply via email to