hi Wenjian -- I am not an expert in the Java library. Perhaps Bryan,
Li, Jacques, or Sidd can point you in the right direction. You can
take a look at the Dremio codebase to see more examples of Arrow in
action

https://github.com/dremio/dremio-oss

- Wes

On Tue, Aug 14, 2018 at 10:08 PM, Xu,Wenjian <[email protected]> wrote:
> Hi Wes,
>
> Thank you for your kind help.
>
> Actually I am working on the Java UDF iterating the *array<string>* in SQL
> language.
>
> I understand that , in order to represent *array<string>* in Arrow format, I
> could use ListVector with VarCharVector as the inner list. My question is,
> how to efficiently access the all the elements (i.e., each byte[] as
> string)?
>
> By checking the test code:
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java
>
> one option is to use ListVector.getObject(int index) to get each
> ArrayList<Text>, and then iterate each element in ArrayList<Text>. But this
> method is expensive because:
>
> 1) it calls VarCharVector.get(int index) which involves memory copy
> 2) it calls Text.set(byte[]) which assemble the Text from byte array.
>
> My goal is just to retrieve each byte[] and do some filtering. Is there any
> other less expensive method to achieve my goal? For example,
> VarCharVector.get(int index, NullableVarCharHolder holder) seems to be a
> less-expensive operation. But how to use this method in my case?
>
> Thanks again.
>
> Best regards,
> Wenjian
>
>
>
>
> On Wed, Aug 15, 2018 at 3:19 AM Wes McKinney <[email protected]> wrote:
>>
>> hi Wenjian,
>>
>> In C++ you can use ListBuilder together with UInt8Builder. There are
>> examples of using ListBuilder you can look at in
>> src/arrow/array-test.cc.
>>
>> For Java you might want to have a look at how Spark SQL converts its
>> Array<T> types into Arrow (there should be other examples in the Java
>> unit test suite, too):
>>
>>
>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
>>
>> - Wes
>>
>> On Mon, Aug 13, 2018 at 6:00 AM, Xu,Wenjian <[email protected]> wrote:
>> > Hi,
>> >
>> > If I want to create list<list<byte>> structure (as shown in
>> > https://arrow.apache.org/docs/memory_layout.html), what class(es) do I
>> > need
>> > to use in Java API and C++ API?
>> >
>> > Any suggestion would be appreciated. Thanks.
>> >
>> > Best regards,
>> > Wenjian

Reply via email to