Hi everyone, As part of creating the new arrow-dotnet repository, the contents of the format directory from the main arrow repository had to be copied [1]. This contains language agnostic flatbuffer and protobuf definitions for the Arrow IPC and Flight formats that can be used to generate code. Both the arrow-rs [2] and arrow-java [3] repositories also contain copies of these files that have to be manually updated when there are format changes.
It appears that other implementations check in generated code rather than generate code at build time, so don't need to store the original definitions (at least arrow-go [4] and arrow-swift [5] do this, I haven't looked closely at all implementations). I wonder whether it would simplify processes if there was a shared arrow-format repository to store these files, which could be included as a git submodule in other repositories, similar to how the arrow-testing and parquet-testing repositories are used. This would make it easy to see whether the format files are up to date, and prevent potential divergence between implementations. On the other hand, these format files aren't updated frequently and git submodules add extra developer friction. They aren't checked out by default when cloning for example, and changes that cross repository boundaries require extra coordination. What do people think of this idea? Would it be worth setting up a new arrow-format repository? Thanks, Adam [1]: https://github.com/apache/arrow-dotnet/pull/17 [2]: https://github.com/apache/arrow-rs/tree/main/format [3]: https://github.com/apache/arrow-java/tree/main/arrow-format [4]: https://github.com/apache/arrow-go/blob/a661aa4711c27a065907512c69bf2e9d3454b936/arrow/internal/flatbuf/Schema.go#L17 [5]: https://github.com/apache/arrow-swift/blob/99275981ac54ab25a9f308f6182acf571385bda6/Arrow/Sources/Arrow/Schema_generated.swift#L18