peter-toth commented on code in PR #10473:
URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604645685
##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -48,7 +51,42 @@ use indexmap::IndexMap;
/// Since an identifier is likely to be copied many times, it is better that
an identifier
/// is small or "copy". otherwise some kinds of reference count is needed.
String
/// description here is not such a good choose.
-type Identifier = String;
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq)]
+struct Identifier<'n> {
+ hash: u64,
+ expr: &'n Expr,
+}
+
+const SEED: RandomState = RandomState::with_seeds(0, 0, 0, 0);
+
+impl<'n> Identifier<'n> {
+ pub fn new(expr: &'n Expr) -> Self {
+ let mut hasher = SEED.build_hasher();
+ expr.hash_node(&mut hasher);
+ let hash = hasher.finish();
+ Self { hash, expr }
+ }
+
+ pub fn combine(mut self, other: Option<Self>) -> Self {
+ other.map_or(self, |other_id| {
+ self.hash = combine_hashes(self.hash, other_id.hash);
+ self
+ })
+ }
+}
+
+impl Hash for Identifier<'_> {
+ fn hash<H: Hasher>(&self, state: &mut H) {
+ state.write_u64(self.hash);
+ }
+}
+
+impl From<Identifier<'_>> for String {
+ fn from(id: Identifier<'_>) -> Self {
+ format!("common_{}", id.hash)
Review Comment:
I'm still undecided if we should use CSE identifiers for aliases too. It is
ok to use them in the data structures of CSE for the elimination logic as these
identifiers contain a reference to `Expr` too so in case we have hash collision
the equality check can save us, but as soon as we use only the hash part for
aliasing alias collision can happen.
I feel we should probably ispect the schema and generate unique aliases
based on the existing column...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]