erratic-pattern commented on issue #10426:
URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105737013

   Thanks for the detailed write up @peter-toth .  Though I did mention 
`HashSet<Expr>` specifically, my suggestion more generally goes along the lines 
of using the `Hash` implementation in some way to produce the identifiers. 
After looking at the code a bit more, I do see the cloning/lifetime issues with 
using `Expr` or `&Expr` as keys directly. I also did not consider the cost of 
re-computing hashes. I do think in that case it does make sense to pre-compute 
the hash instead.
   
   I like the idea of generalizing the `(u64, &Expr)` struct into something 
reuseable across optimizations, as it seems to be a common pattern where we 
need to:
   
   * produce some unique identifier for an expression that can be stored in a 
data structure
   * use that identifier to generate aliases for newly generated expressions, 
or create a new `Column`/`Field` somewhere with that expression as a name. this 
can be done thanks to the `&Expr` in the struct which would allow us to call 
`display_name` 
   * do so in a way that doesn't conflict with ownership/borrowing semantics. 
we might still run into borrowing issues because of the `&Expr` reference, but 
it's hard to say without trying to adapt this solution to other optimizers. 
`Rc` or `Arc` is a potential option as well. The struct could potentially be 
generic over `Deref<Target = Expr>` to support any of these.
   * avoid recomputing the hash/key on every insert/lookup operation
   
   Anyway, I don't want to over-abstract just yet, so for now just build 
something that works for CSE and then we can take it and see if it can be 
applied to any of the other optimizations.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to