adriangb commented on code in PR #18449:
URL: https://github.com/apache/datafusion/pull/18449#discussion_r2507622571


##########
datafusion/physical-expr/src/utils/guarantee.rs:
##########
@@ -93,12 +93,12 @@ impl LiteralGuarantee {
     /// Create a new instance of the guarantee if the provided operator is
     /// supported. Returns None otherwise. See [`LiteralGuarantee::analyze`] to
     /// create these structures from an predicate (boolean expression).
-    fn new<'a>(
+    fn new(

Review Comment:
   Okay I've looked into this and it is entirely possible, I think we should do 
it.
   Basically the status quo currently is that we always *try* to build an 
`ArrayHashSet` which is only possible if we can convert the `Vec<ScalarValue>` 
into an `ArrayRef`.
   
   At that point the only reason to store the `Vec<SclarValue>` is to later 
pass it into `PruningPredicate` -> bloom filters and `LiteralGuarantee`. If we 
can refactor those two to also handle an `ArrayRef` we could probably avoid a 
lot of cloning and make things more efficient by using arrays. I don't even 
think we need to support `Vec<ScalarValue>` at all: the only reason to have 
that is if you could not build a homogeneously typed array, and if that is the 
case you probably can't do any sort of pushdown into a bloom filter. E.g. 
`select variant_get(col, 'abc') in (1, 2.0, 'c')` might make sense and work but 
I don't think we could ever push that down into a bloom filter. So `InListExpr` 
needs to operate on both but I don't think the pruning machinery does.
   
   So anyway I think I may try to reduce this change to only be about using 
`create_hashes` and ignore any inefficiencies as a TODO for a followup issue. 
At the end of the day if we can make HashJoinExec faster even if that's with 
some inefficiencies I think that's okay and we can improve more later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to