Hi Antoine, It is nice to hear from you!
> (I would perhaps also call it "application data" or something) I’m happy with ApplicationData as the name. > On the face of it, this looks like a reasonable idea, though I wonder if > it should be a separate message type *or* an optional field carried > together in RecordBatches. The main issue with carrying this in RecordBatch metadata is ordering. While IPC already supports `custom_metadata` via `write_batch` (which I’ve been using), that approach assumes the application data can be attached to a specific batch. In some cases, the application data and record batches are produced independently and cannot be cleanly associated. A concrete example is interleaving stderr output (arbitrary log messages) with record batches written to stdout, while preserving a single ordered IPC stream. I experimented with using zero-row record batches as a workaround, but this is inefficient: even with no rows, the serialized message size grows with schema complexity. I’ve measured this across several schemas; details and code are here: https://gist.github.com/rustyconover/6ff8cbd93369735287d80ae60436379e In short, zero-row batches can cost anywhere from ~120 bytes for simple schemas to ~450+ bytes for more complex ones, which makes this approach unattractive when trying to minimize bytes on the wire. For these reasons, a distinct IPC message type for application data seems like the cleanest solution. I’d be very interested in whether others have run into the need for this as well. Rusty On Tue, Feb 3, 2026, at 5:58 PM, Antoine Pitrou wrote: > Hi Rusty, > > > > Regards > > Antoine. > > > Le 03/02/2026 à 17:31, Rusty Conover a écrit : >> Hi Arrow Friends, >> >> I’ve really appreciated Arrow Flight’s ability to carry custom metadata >> messages alongside record batches. In some of my current work, however, I’m >> dealing with Arrow IPC streams that are *not* sent via Flight, and I’d like >> to have a comparable capability there as well. >> >> To support this, I’d like to propose adding a new IPC message >> type—tentatively named `*OpaqueBytes*`—that would allow arbitrary bytes to >> be embedded directly within IPC streams. IPC readers that do not understand >> this message type could safely ignore it, preserving compatibility. >> >> My motivation is to enable multiplexing of auxiliary messages within a >> stream that otherwise consists of schemas, dictionaries, and record batches. >> A concrete example would be interleaving logging or signaling messages with >> record batches. Today, I’m approximating this by emitting zero-row record >> batches with binary metadata, but this approach is awkward and incurs >> unnecessary overhead due to schema complexity. >> >> An `OpaqueBytes` IPC message type could enable a range of use cases, >> including (but not limited to) logging, flow control, signaling, and other >> auxiliary communication needs that don’t naturally map to record batches. >> >> I briefly discussed this idea a few weeks ago on the Apache Arrow call, but >> wanted to share it here to reach a broader audience and gather more feedback. >> >> In addition to the message type itself, I’d also be interested in hearing >> thoughts on how PyArrow’s interfaces might be extended to allow users to >> read and write these arbitrary messages as part of existing IPC stream >> readers and writers. >> >> Looking forward to your thoughts and discussion. >> >> Kind regards, >> Rusty
