Hi Jayjeet,

I wonder if you really need to serialize the whole table into a single
buffer as you will end up with twice the memory while you could be sending
chunks as they are generated by the  RecordBatchStreamWriter. Also is the
buffer resized beforehand? I'd suspect there might be relocations happening
under the hood.


Cheers,
Gosh

On Thu., 10 Jun. 2021, 21:01 Wes McKinney, <wesmck...@gmail.com> wrote:

> hi Jayjeet — have you run prof to see where those 1000ms are being
> spent? How many arrays (the sum of the number of chunks across all
> columns) in total are there? I would guess that the problem is all the
> little Buffer memcopies. I don't think that the C Interface is going
> to help you.
>
> - Wes
>
> On Thu, Jun 10, 2021 at 1:48 PM Jayjeet Chakraborty
> <jayjeetchakrabort...@gmail.com> wrote:
> >
> > Hello Arrow Community,
> >
> > I am a student working on a project where I need to serialize an
> in-memory Arrow Table of size around 700MB to a uint8_t* buffer. I am
> currently using the arrow::ipc::RecordBatchStreamWriter API to serialize
> the table to a arrow::Buffer, but it is taking nearly 1000ms to serialize
> the whole table, and that is harming the performance of my
> performance-critical application. I basically want to get hold of the
> underlying memory of the table as bytes and send it over the network. How
> do you suggest I tackle this problem? I was thinking of using the C Data
> interface for this, so that I convert my arrow::Table to ArrowArray and
> ArrowSchema and serialize the structs to send them over the network, but
> seems like serializing structs is another complex problem on its own.  It
> will be great to have some suggestions on this. Thanks a lot.
> >
> > Best,
> > Jayjeet
> >
>

Reply via email to