Unlike programming language type definitions, .proto (and FlatBuffers)
files are artifacts that you ship to consumers of the API defined by them.
You only append to these definitions. If a field is removed, its field
index must never be used again, and so on.

So one should not stress so much about syncing these files. It’s actually a
good thing that a version in a specific arrow-LANG repository does not
reflect all the latest developments of the spec but only what that specific
implementation currently understands.

Forcing a synchronization creates more problems than it solves. It’s a
counter-intuitive conclusion because not many things in programming are
designed with backwards compatibility in mind like these protocol
definition languages.

(For many years, I shipped mobile apps containing .proto definitions used
for communication with servers and local persistence. Every week there’s a
copy of the app on millions of phones with code generated from “outdated”
.proto files. It works really well if you’re always keeping backwards
compatibility in mind.)

—
Felipe

On Wed, 20 Aug 2025 at 01:40 Adam Reeve <adre...@gmail.com> wrote:

> Hi everyone,
>
> As part of creating the new arrow-dotnet repository, the contents of the
> format directory from the main arrow repository had to be copied [1]. This
> contains language agnostic flatbuffer and protobuf definitions for the
> Arrow IPC and Flight formats that can be used to generate code. Both the
> arrow-rs [2] and arrow-java [3] repositories also contain copies of these
> files that have to be manually updated when there are format changes.
>
> It appears that other implementations check in generated code rather than
> generate code at build time, so don't need to store the original
> definitions (at least arrow-go [4] and arrow-swift [5] do this, I haven't
> looked closely at all implementations).
>
> I wonder whether it would simplify processes if there was a shared
> arrow-format repository to store these files, which could be included as a
> git submodule in other repositories, similar to how the arrow-testing and
> parquet-testing repositories are used. This would make it easy to see
> whether the format files are up to date, and prevent potential divergence
> between implementations.
>
> On the other hand, these format files aren't updated frequently and git
> submodules add extra developer friction. They aren't checked out by default
> when cloning for example, and changes that cross repository boundaries
> require extra coordination.
>
> What do people think of this idea? Would it be worth setting up a new
> arrow-format repository?
>
> Thanks,
> Adam
>
> [1]: https://github.com/apache/arrow-dotnet/pull/17
> [2]: https://github.com/apache/arrow-rs/tree/main/format
> [3]: https://github.com/apache/arrow-java/tree/main/arrow-format
> [4]:
>
> https://github.com/apache/arrow-go/blob/a661aa4711c27a065907512c69bf2e9d3454b936/arrow/internal/flatbuf/Schema.go#L17
> [5]:
>
> https://github.com/apache/arrow-swift/blob/99275981ac54ab25a9f308f6182acf571385bda6/Arrow/Sources/Arrow/Schema_generated.swift#L18
>

Reply via email to