jcsherin opened a new issue, #10979:
URL: https://github.com/apache/datafusion/issues/10979

   ### Describe the bug
   
   Running `regen.sh` in `datafusion-proto` fails with the error:
   
   ```sh
   $ ./regen.sh
   Error: "protobuf compilation failed: protoc failed: 
datafusion/proto/proto/datafusion.proto: This file contains proto3 optional 
fields, but --experimental_allow_proto3_optional was not set.\n"
   ```
   
   The error happens with libprotoc 3.12.4, which is the version recommended in 
the [DataFusion 
docs](https://datafusion.apache.org/contributor-guide/getting_started.html#protoc-installation).
 This is also the default version installed by apt in Ubuntu 22.04.
   
   ```sh
   $ protoc --version
   libprotoc 3.12.4
   ```
   
   A similar bug was reported in #8602.
   
   I needed to run `regen.sh` while trying to migrate `string_agg`, one of the 
built-in aggregate functions pending migration, listed in #8708. 
   
   **Could you recommend an approach to fix this? I have listed 3 possible ways 
here.**
   
   ### 1. Add flag
   
   [Satisfy the experimental 
check](https://github.com/protocolbuffers/protobuf/blob/main/docs/implementing_proto3_presence.md#satisfying-the-experimental-check)
 by passing `--experimental_allow_proto3_optional` to the `protoc` compiler 
invocation. I've verified locally that this will work.
   ```rust
   // In datafusion/proto/gen/src/main.rs L31
   prost_build::Config::new()
        .protoc_arg("--experimental_allow_proto3_optional")
   ``` 
   
   ### 2. Convert to oneof
   
   Rewrite `optional` field as a `oneof` with a single field.
   
   Currently, in `datafusion.proto` there are two usages of `optional`:
   ```protobuf
   message ScalarUDFExprNode {  
     string fun_name = 1;  
     repeated LogicalExprNode args = 2;  
     optional bytes fun_definition = 3;  
   }
   
   message PhysicalScalarUdfNode {  
     string name = 1;  
     repeated PhysicalExprNode args = 2;  
     optional bytes fun_definition = 3;  
     datafusion_common.ArrowType return_type = 4;  
   }
   ```
   
   We can rewrite the `optional` field as a `oneof` with a single field like 
this:
   
   ```protobuf
   message ScalarUDFExprNode {  
     string fun_name = 1;  
     repeated LogicalExprNode args = 2;  
     oneof fun_definition_opt {  
       bytes fun_definition = 3;  
     }  
   }
   ```
   
   This technique is already used in other message types in `datafusion.proto` 
like - 
[`WindowFrame`](https://github.com/apache/datafusion/blob/b26c1b819dff7ed1b48ab20c66b4bd1226ff8d79/datafusion/proto/proto/datafusion.proto#L628-L630),
 `CsvScanExecNode`, `ProjectionNode` etc.
   
   I've tried this change in my fork and verified that it works. [Please see 
the diff 
here](https://github.com/apache/datafusion/compare/main...jcsherin:datafusion:fixes-proto-regen)
   
   ### 3. Remove optional
   
   We could remove the `optional` keyword usage here, which would fix the 
issue. It might however have other cascading effects. To evaluate it, it might 
be useful to see #8706, where the message type in question was designed. 
   
   
   ### To Reproduce
   
   You need `protoc` version `3.12.4`.
   
   ### Expected behavior
   
   Running `regen.sh` should exit without errors when using `protoc` version 
`3.12.4`.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to