jacobsimpson commented on PR #7009:
URL: https://github.com/apache/arrow-rs/pull/7009#issuecomment-2620386539

   > > I'll continue to look into this, however at the moment it appears that 
we can't depend on one machine being able to produce the same file descriptor 
sets as another machine.
   > 
   > Can the descriptor be used by a machine on which it wasn't created 🤔
   
   It is my understanding that it is not the _machine_ where it was created 
that determines compatibility, it is the `protoc` compiler. There 
`descriptor.proto` can change even in [patch 
versions](https://protobuf.dev/support/version-support/#changes). I think, as 
long as the file descriptors are generated at the same time as the other 
generated code, it should all work together correctly.
   
   Compatibility alone is not enough to pass the build though. The way I 
understand it, no two 
machines/protoc-versions/whatever-additional-unspecified-conditions are 
guaranteed to produce the [same proto 
serializations](https://protobuf.dev/programming-guides/serialization-not-canonical/#deterministic-is-not-canonical).
 And the `diff` step means we depend on them being the same, not just 
compatible.
   
   > Maybe we would have to always regenerate this file on build (rather than 
have it checked in)
   >
   > That comes with its own problems (e.g. increasing the reuqired 
depenednecies to build arrow-flight)
   
   According to the 
[CONTRIBUTING](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c5adafb6855/arrow-flight/CONTRIBUTING.md)
 documentation, the current solution (where the change author required to run 
`protoc` and commit generated files) is there because requiring the toolchain 
be present for _all_ source code consumers was difficult. So, having the build 
always generate the proto files is a possibility, however it looks like it 
would be a behavior regression (so to speak).
   
   Two other possibilities:
   2. Change the Github check so those two files are excluded from the 
'up-to-date' check. Basically, have faith that if the change author correctly 
regenerated the other proto files, then the file descriptor set files were also 
correctly regenerated. I think there would be a lot of trust in this option. 
However, it would be easy. Something like (untested): `git diff --exit-code -- 
. ':(exclude)*.bin'` 
[here](https://github.com/apache/arrow-rs/blob/2bd43e99da6258926927cf2904e1eac815542b8f/.github/workflows/arrow_flight.yml#L77)
   3. Change/add a Github action to regenerate the file descriptor set files 
(or even all proto files) and add the generated files to the PR. _If_ it's 
possible to give that behavior to a Github action, I think it would make for an 
awkward contributor experience, the remote branch would have changes even when 
the author is the only one working on it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to