Hi all,

Thanks for the thoughtful discussion; this has been really helpful to
follow.

It seems to me that there are two slightly different (but related) needs
being discussed:

   1.

   The ability to carry *opaque, non-string bytes* in IPC, avoiding the
   overhead and semantic mismatch of zero-row RecordBatches and the
   string-only limitation of custom_metadata.
   2.

   The ability to control the association* and ordering* of such data with
   respect to RecordBatch messages (sometimes tightly coupled, sometimes
   intentionally independent).

>From that perspective, the combination Antoine suggested of a lightweight
empty message type plus an application_data bytes field feels like a nice
decomposition:

   -

   Empty provides a low-overhead, ordering-preserving carrier when the data
   is intentionally independent.
   -

   An application_data field on Message allows attaching bytes directly to
   RecordBatch (or other messages) when tight association is desired (e.g.,
   statistics, per-batch signals).

This also seems to align well with David’s point about avoiding base64 and
with Dewey’s use cases where the payload is meaningful but doesn’t
naturally fit schema metadata.

One thing I like about this direction is that it keeps the initial scope
focused: it doesn’t force multiplexing or structured interpretation up
front but still leaves room to experiment (e.g., embedding serialized IPC
in Empty or evolving higher-level conventions later).

>From an implementation point of view, it also seems feasible to prototype
incrementally:

   -

   introduce Empty + application_data(bytes) in the IPC format
   -

   initially treat application_data as opaque and pass-through in
   readers/writers
   -

   let higher-level libraries decide how (or whether) to interpret it

Happy to help with prototyping or reviewing pieces of this if that’s useful.

Best regards,
Vignesh

On Thu, 5 Feb 2026 at 14:17, Antoine Pitrou <[email protected]> wrote:

>
> Le 05/02/2026 à 04:44, Dewey Dunnington a écrit :
> >> a new application_data field in the Message table to pass arbitrary
> >> opaque data with any kind of message
> >
> > I believe this could be done with the Empty message by putting the bytes
> in
> > the body instead of in the header. Probably the only place this
> > functionally makes a difference would be dissociated IPC where the body
> is
> > transported separately. Perhaps both are useful.
>
> I thought about that, but then it means the application data can only be
> transmitted in the new Empty message, not with a RecordBatch.
>
> That's not necessarily a problem, just a limitation to think about.
>
> >> It could be interesting to support multiplexing multiple IPC streams
> over
> > the same socket
> >
> > I agree that there are some applications of the Empty where it would be
> > tempting to have the payload of the Empty be serialized IPC (e.g., if
> used
> > for statistics and the statistics are encoded in the lovely Arrow spec we
> > have for that). Perhaps with Empty one could prototype that.
>
> I hope we can find a way of transporting statistics together with the
> corresponding RecordBatch message, as opposed to a separate message in
> the IPC stream.
>
> Regards
>
> Antoine.
>
>

Reply via email to