jcsherin opened a new issue, #10979: URL: https://github.com/apache/datafusion/issues/10979
### Describe the bug Running `regen.sh` in `datafusion-proto` fails with the error: ```sh $ ./regen.sh Error: "protobuf compilation failed: protoc failed: datafusion/proto/proto/datafusion.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.\n" ``` The error happens with libprotoc 3.12.4, which is the version recommended in the [DataFusion docs](https://datafusion.apache.org/contributor-guide/getting_started.html#protoc-installation). This is also the default version installed by apt in Ubuntu 22.04. ```sh $ protoc --version libprotoc 3.12.4 ``` A similar bug was reported in #8602. I needed to run `regen.sh` while trying to migrate `string_agg`, one of the built-in aggregate functions pending migration, listed in #8708. **Could you recommend an approach to fix this? I have listed 3 possible ways here.** ### 1. Add flag [Satisfy the experimental check](https://github.com/protocolbuffers/protobuf/blob/main/docs/implementing_proto3_presence.md#satisfying-the-experimental-check) by passing `--experimental_allow_proto3_optional` to the `protoc` compiler invocation. I've verified locally that this will work. ```rust // In datafusion/proto/gen/src/main.rs L31 prost_build::Config::new() .protoc_arg("--experimental_allow_proto3_optional") ``` ### 2. Convert to oneof Rewrite `optional` field as a `oneof` with a single field. Currently, in `datafusion.proto` there are two usages of `optional`: ```protobuf message ScalarUDFExprNode { string fun_name = 1; repeated LogicalExprNode args = 2; optional bytes fun_definition = 3; } message PhysicalScalarUdfNode { string name = 1; repeated PhysicalExprNode args = 2; optional bytes fun_definition = 3; datafusion_common.ArrowType return_type = 4; } ``` We can rewrite the `optional` field as a `oneof` with a single field like this: ```protobuf message ScalarUDFExprNode { string fun_name = 1; repeated LogicalExprNode args = 2; oneof fun_definition_opt { bytes fun_definition = 3; } } ``` This technique is already used in other message types in `datafusion.proto` like - [`WindowFrame`](https://github.com/apache/datafusion/blob/b26c1b819dff7ed1b48ab20c66b4bd1226ff8d79/datafusion/proto/proto/datafusion.proto#L628-L630), `CsvScanExecNode`, `ProjectionNode` etc. I've tried this change in my fork and verified that it works. [Please see the diff here](https://github.com/apache/datafusion/compare/main...jcsherin:datafusion:fixes-proto-regen) ### 3. Remove optional We could remove the `optional` keyword usage here, which would fix the issue. It might however have other cascading effects. To evaluate it, it might be useful to see #8706, where the message type in question was designed. ### To Reproduce You need `protoc` version `3.12.4`. ### Expected behavior Running `regen.sh` should exit without errors when using `protoc` version `3.12.4`. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org