Jorge Leitão created ARROW-16118:
------------------------------------

             Summary: [C++] Reduce memory usage when writing to IPC
                 Key: ARROW-16118
                 URL: https://issues.apache.org/jira/browse/ARROW-16118
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Jorge Leitão


Writing a record batch to IPC ([header][buffers]) currently requires O(N*B) 
where N is the average size of the buffer and B the number of buffers.

This is because we need the buffer location and total number of bytes to write 
the header of the record, which is only known after e.g. compressing them.

When the writer supports seeking, this memory usage can be reduced to O(N) 
where N is the average size of a primitive buffer over all fields. This is done 
using the following pseudo-code implementation:


{code:java}
start = writer.seek(current);
empty_locations = create_empty_header(schema)
write_header(writer, empty_locations)
locations = write_buffers(writer, batch)
writer.seek(start)
write_header(writer, locations)
{code}

This has a significantly lower memory footprint. O(N) vs O(N*B)

It could be interesting for the C++ implementation to support this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to