jacobsimpson commented on PR #7009: URL: https://github.com/apache/arrow-rs/pull/7009#issuecomment-2620386539
> > I'll continue to look into this, however at the moment it appears that we can't depend on one machine being able to produce the same file descriptor sets as another machine. > > Can the descriptor be used by a machine on which it wasn't created 🤔 It is my understanding that it is not the _machine_ where it was created that determines compatibility, it is the `protoc` compiler. There `descriptor.proto` can change even in [patch versions](https://protobuf.dev/support/version-support/#changes). I think, as long as the file descriptors are generated at the same time as the other generated code, it should all work together correctly. Compatibility alone is not enough to pass the build though. The way I understand it, no two machines/protoc-versions/whatever-additional-unspecified-conditions are guaranteed to produce the [same proto serializations](https://protobuf.dev/programming-guides/serialization-not-canonical/#deterministic-is-not-canonical). And the `diff` step means we depend on them being the same, not just compatible. > Maybe we would have to always regenerate this file on build (rather than have it checked in) > > That comes with its own problems (e.g. increasing the reuqired depenednecies to build arrow-flight) According to the [CONTRIBUTING](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c5adafb6855/arrow-flight/CONTRIBUTING.md) documentation, the current solution (where the change author required to run `protoc` and commit generated files) is there because requiring the toolchain be present for _all_ source code consumers was difficult. So, having the build always generate the proto files is a possibility, however it looks like it would be a behavior regression (so to speak). Two other possibilities: 2. Change the Github check so those two files are excluded from the 'up-to-date' check. Basically, have faith that if the change author correctly regenerated the other proto files, then the file descriptor set files were also correctly regenerated. I think there would be a lot of trust in this option. However, it would be easy. Something like (untested): `git diff --exit-code -- . ':(exclude)*.bin'` [here](https://github.com/apache/arrow-rs/blob/2bd43e99da6258926927cf2904e1eac815542b8f/.github/workflows/arrow_flight.yml#L77) 3. Change/add a Github action to regenerate the file descriptor set files (or even all proto files) and add the generated files to the PR. _If_ it's possible to give that behavior to a Github action, I think it would make for an awkward contributor experience, the remote branch would have changes even when the author is the only one working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
