erratic-pattern commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105737013
Thanks for the detailed write up @peter-toth . Though I did mention `HashSet<Expr>` specifically, my suggestion more generally goes along the lines of using the `Hash` implementation in some way to produce the identifiers. After looking at the code a bit more, I do see the cloning/lifetime issues with using `Expr` or `&Expr` as keys directly. I also did not consider the cost of re-computing hashes. I do think in that case it does make sense to pre-compute the hash instead. I like the idea of generalizing the `(u64, &Expr)` struct into something reuseable across optimizations, as it seems to be a common pattern where we need to: * produce some unique identifier for an expression that can be stored in a data structure * use that identifier to generate aliases for newly generated expressions, or create a new `Column`/`Field` somewhere with that expression as a name. this can be done thanks to the `&Expr` in the struct which would allow us to call `display_name` * do so in a way that doesn't conflict with ownership/borrowing semantics. we might still run into borrowing issues because of the `&Expr` reference, but it's hard to say without trying to adapt this solution to other optimizers. `Rc` or `Arc` is a potential option as well. The struct could potentially be generic over `Deref<Target = Expr>` to support any of these. * avoid recomputing the hash/key on every insert/lookup operation Anyway, I don't want to over-abstract just yet, so for now just build something that works for CSE and then we can take it and see if it can be applied to any of the other optimizations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org