Hey Jeremy,

Currently the first message of an IPC stream is a Schema message which
consists solely of a flatbuffer message and defined in the Schema.fbs file
of the Arrow repo. All of the libraries that can read Arrow IPC should be
able to also handle converting a single IPC schema message back into an
Arrow schema without issue. Would that be sufficient for you?

On Mon, Jul 8, 2024 at 11:12 AM Jeremy Leibs <jer...@rerun.io> wrote:

> I'm looking for any advice folks may have on a generic way to document and
> represent expected arrow schemas as part of an interface definition.
>
> For context, our library provides a cross-language (python, c++, rust) SDK
> for logging semantic multi-modal data (point clouds, images, geometric
> transforms, bounding boxes, etc.). Each of these primitive types has an
> associated arrow schema, but to date we have largely abstracted that from
> our users through language-native object types, and a bunch of generated
> code to "serialize" stuff into the arrow buffers before transmitting via
> our IPC.
>
> We're trying to take steps in the direction of making it easier for
> advanced users to write and read data from the store directly using arrow,
> without needing to go in-and-out of an intermediate object-oriented
> representation. However, doing this means documenting to users, for
> example: "This is the arrow schema to use when sending a point cloud with a
> color channel".
>
> I would love it if, eventually, the arrow project had a way of defining a
> spec file similar to a .proto or a .fbs, with all libraries supporting
> loading of a schema object by directly parsing the spec. Has anyone taken
> steps in this direction?
>
> The best alternative I have at the moment is to redundantly define the
> schema for each of the 3 languages implicitly by directly providing the
> code to construct a datatype instance with the correct schema. But this
> feels unfortunately messy and hard to maintain.
>
> Thanks,
> Jeremy
>

Reply via email to