timsaucer commented on PR #11550: URL: https://github.com/apache/datafusion/pull/11550#issuecomment-2242694088
Copying @jayzhan211 's comment from > I had tried to directly modify Expr before, but ends up builder API. I couldn't remember the reason My *guess* is that it has to do with all of the other functions returning `Expr` rather than `Result<Expr>` and to fit neatly in with the rest of the code base, you wanted one point of checking? Aside from that for Aggregate I don't see a strong need to have the builder. Actually, window functions have a better argument for them with the fact that you need to (sometimes) build the window frame if it is not specified. The reasons I can see sticking with a builder function instead of updating the `Expr` directly: 1. Minimal change to the API. It *should* only require updating the import since all of the signatures in `AggregateExt` are identical in `ExprFunctionExt` and there are only new methods added in. We could even keep `AggregateExt` and mark it deprecated for a release or two. 2. Builder allows checking for errors at one point instead of needing to do `?` after every operation. Right now there are two types of errors in `AggregateExt`, calling one of the functions on an Expr that is not an aggregate function or setting and `order_by` expression that is not a `Sort`. 3. As mentioned above, it actually does make setting up the window functions cleaner. With the window functions, if the user specifies a window frame we want to use that. If not, we want to create it based on a few criteria. This is easier with a builder function so we can perform this operation once. In my alternate approach, I update `WindowsFunction` to make the frame optional. Then at the point of consuming the window function data I perform a `get_frame_or_default()` that I added in. However these uses I don't expect to be something the end user experiences and is mostly upon the DataFusion code to maintain. The reasons I can see preferring to work directly with the `Expr`: 1. I think this does create a better user experience in the long run. We can perform our checks along the way, which I think is more idiomatic. Plus it relieves the user of needing to understand that we're going from an expression into a builder of an expression. 2. The code to do this is in most ways simpler. We can remove the trait entirely, and simply have functions that are implemented on `Expr`. The notable exception to the simplification is the window frame on window functions mentioned above. Laying it out like this, I can see a good argument for sticking with the builder function if that's the way you two (and others) would like to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org