hi Alberto, Have you looked at the relevant usage of Arrow in Apache Spark? See
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala and related modules. On your first question, my understanding is that * ArrowRecordBatch represents the in-memory record batch and * RecordBatch (in org.apache.arrow.flatbuf) is for the serialized record batch metadata, called the "data header" commonly (defined in Message.fbs) - Wes On Sun, Aug 5, 2018 at 9:13 AM, ALBERTO Bocchinfuso <[email protected]> wrote: > > Good morning, > > I have to use apache arrow with scala, so I’m using the Java API from scala, > but I’m confused, I hope that someone is going to clarify something for me. > > First of all, what is the difference between ArrowRecordBatch (in > org.apache.arrow.vector.ipc.message) and RecordBatch (in > org.apache.arrow.flatbuf)? > In this regard, if a coder wants to use arrow just for IPC, should she > consider only the classes in the package org.apache.arrow.vector, or should > she learn also how to use the other packages, particularly io.netty.buffer > and org.apache.arrow.memory and org.apache.arrow.flatbuf? > > I don’t understand how to perform in java everything that is done in python > like in the documentation pages: > http://arrow.apache.org/docs/python/data.html > http://arrow.apache.org/docs/python/ipc.html > > I’d like to understand how I can create what in python is called a > RecordBatch, and serialize it in a stream, for example to write it on a file > or whatever. > I think ArrowRecordBatch can be created by using the constructors, once you > built a list of ArrowFieldNode (I haven’t understood what this class stands > for, to be honest) and ArrowBuff (I haven’t understood how to create one, I > think that I should instantiate an ArrowByteBufAllocator though alloc(), but > then I wouldn’t know how to procede...), but I’m not sure. > I hope that my doubts are going to be cleared. > > Thank you, > Alberto >
