[Discuss] Equivalence of IPC file and stream formats

Antoine Pitrou Tue, 17 Feb 2026 10:20:22 -0800


Hello,

The IPC file format is defined as the IPC stream format, preceded by aheader (the Arrow magic bytes) and followed by a footer (a catalog ofrecord batches, and the Arrow magic bytes). Thus, reading and writingIPC files can reuse the same basic building blocks as for IPC streams(this is almost trivial for writing, which is usually done sequentially).

As a consequence, IPC files practically result in valid identical IPCstreams (ignoring the 8 header bytes) that read as the same logicalcontents.

However, there is no theoretical guarantee that this is always the case.Consider a IPC file writer that would write record batches in reverseorder in the footer, compared to their sequential order in theunderlying stream. Or, more generally, an IPC file footer that wouldrepeat or skip some batches in the stream.

So theoretically, we cannot assume that reading an IPC file as an IPCstream (after skipping the 8 header bytes) returns the intended contents.

However, it seems that it could be useful to be able to make such anassumption. Hence these questions:

1. Do all current IPC file writers uphold this assumption?
2. Do we want to make it a more explicit requirement of the IPC file format?

Context: I've submitted a PR(https://github.com/apache/arrow/pull/49312) to enable differentialfuzzing in the C++ IPC file fuzzer, where I'm comparing the results ofthe IPC file and stream readers on the fuzzing payload.


Regards

Antoine.

[Discuss] Equivalence of IPC file and stream formats

Reply via email to