Adding the Arrow dev list.
Yes, VarCharVector.get(int index, NullableVarCharHolder holder) is a
cheaper method.
You can get the offsets from list vector and then using the holder to
retrieve pointers into the exist memory. That memory is offheap so you'll
have to do a copy if you want a byte array.
Pseudo code:
int recordIndexToRead = ...
ListVector lv = ...
ArrowBuf offsetVector = lv.getOffsetBuffer();
VarCharVector vc = lv.getDataVector();
int listStart = lv.offsetBuffer.getInt((recordIndexToRead ) * 4) ;
int listEnd = lv.offsetBuffer.getInt((recordIndexToRead + 1) * 4);
NullableVarCharHolder nvh = new NullableVarCharHolder();
for(int i = listStart; i < listEnd; i++){
vc.get(i, nvh);
// do something with data.
}
On Fri, Aug 31, 2018 at 2:08 AM Xu,Wenjian <[email protected]> wrote:
> Hi Jacques,
>
> I have a question about ListVector in Arrow Java API. Thanks for your kind
> help.
>
> I would like to iterate through *array<string>* in SQL semantics.
>
> I understand that , in order to represent *array<string>* in Arrow format,
> I could use ListVector with VarCharVector as the inner list. My question
> is, how to efficiently access all the elements (i.e., each byte[] as
> string)?
>
> By checking the test code:
>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java
>
> one option is to use ListVector.getObject(int index) to get each
> ArrayList<Text>, and then access each element in ArrayList<Text>. But this
> method is expensive because:
>
> 1) it calls VarCharVector.get(int index) which involves memory copy
> 2) it calls Text.set(byte[]) which assemble the Text from byte array.
>
> My goal is just to retrieve each byte[] and do some filtering. Is there
> any other less expensive method to achieve my goal? For example,
> VarCharVector.get(int index, NullableVarCharHolder holder) seems to be a
> less-expensive operation. But how to use this method in my case?
>
> Thanks again.
>
> Best regards,
> Wenjian
>