Hi Wes, Thank you for your kind help.
Actually I am working on the Java UDF iterating the *array<string>* in SQL language. I understand that , in order to represent *array<string>* in Arrow format, I could use ListVector with VarCharVector as the inner list. My question is, how to efficiently access the all the elements (i.e., each byte[] as string)? By checking the test code: https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java one option is to use ListVector.getObject(int index) to get each ArrayList<Text>, and then iterate each element in ArrayList<Text>. But this method is expensive because: 1) it calls VarCharVector.get(int index) which involves memory copy 2) it calls Text.set(byte[]) which assemble the Text from byte array. My goal is just to retrieve each byte[] and do some filtering. Is there any other less expensive method to achieve my goal? For example, VarCharVector.get(int index, NullableVarCharHolder holder) seems to be a less-expensive operation. But how to use this method in my case? Thanks again. Best regards, Wenjian On Wed, Aug 15, 2018 at 3:19 AM Wes McKinney <[email protected]> wrote: > hi Wenjian, > > In C++ you can use ListBuilder together with UInt8Builder. There are > examples of using ListBuilder you can look at in > src/arrow/array-test.cc. > > For Java you might want to have a look at how Spark SQL converts its > Array<T> types into Arrow (there should be other examples in the Java > unit test suite, too): > > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala > > - Wes > > On Mon, Aug 13, 2018 at 6:00 AM, Xu,Wenjian <[email protected]> wrote: > > Hi, > > > > If I want to create list<list<byte>> structure (as shown in > > https://arrow.apache.org/docs/memory_layout.html), what class(es) do I > need > > to use in Java API and C++ API? > > > > Any suggestion would be appreciated. Thanks. > > > > Best regards, > > Wenjian >
