Unlike programming language type definitions, .proto (and FlatBuffers) files are artifacts that you ship to consumers of the API defined by them. You only append to these definitions. If a field is removed, its field index must never be used again, and so on.
So one should not stress so much about syncing these files. It’s actually a good thing that a version in a specific arrow-LANG repository does not reflect all the latest developments of the spec but only what that specific implementation currently understands. Forcing a synchronization creates more problems than it solves. It’s a counter-intuitive conclusion because not many things in programming are designed with backwards compatibility in mind like these protocol definition languages. (For many years, I shipped mobile apps containing .proto definitions used for communication with servers and local persistence. Every week there’s a copy of the app on millions of phones with code generated from “outdated” .proto files. It works really well if you’re always keeping backwards compatibility in mind.) — Felipe On Wed, 20 Aug 2025 at 01:40 Adam Reeve <adre...@gmail.com> wrote: > Hi everyone, > > As part of creating the new arrow-dotnet repository, the contents of the > format directory from the main arrow repository had to be copied [1]. This > contains language agnostic flatbuffer and protobuf definitions for the > Arrow IPC and Flight formats that can be used to generate code. Both the > arrow-rs [2] and arrow-java [3] repositories also contain copies of these > files that have to be manually updated when there are format changes. > > It appears that other implementations check in generated code rather than > generate code at build time, so don't need to store the original > definitions (at least arrow-go [4] and arrow-swift [5] do this, I haven't > looked closely at all implementations). > > I wonder whether it would simplify processes if there was a shared > arrow-format repository to store these files, which could be included as a > git submodule in other repositories, similar to how the arrow-testing and > parquet-testing repositories are used. This would make it easy to see > whether the format files are up to date, and prevent potential divergence > between implementations. > > On the other hand, these format files aren't updated frequently and git > submodules add extra developer friction. They aren't checked out by default > when cloning for example, and changes that cross repository boundaries > require extra coordination. > > What do people think of this idea? Would it be worth setting up a new > arrow-format repository? > > Thanks, > Adam > > [1]: https://github.com/apache/arrow-dotnet/pull/17 > [2]: https://github.com/apache/arrow-rs/tree/main/format > [3]: https://github.com/apache/arrow-java/tree/main/arrow-format > [4]: > > https://github.com/apache/arrow-go/blob/a661aa4711c27a065907512c69bf2e9d3454b936/arrow/internal/flatbuf/Schema.go#L17 > [5]: > > https://github.com/apache/arrow-swift/blob/99275981ac54ab25a9f308f6182acf571385bda6/Arrow/Sources/Arrow/Schema_generated.swift#L18 >