alamb commented on code in PR #3334:
URL: https://github.com/apache/arrow-datafusion/pull/3334#discussion_r962139573
##########
datafusion/sql/src/planner.rs:
##########
@@ -2453,6 +2453,17 @@ fn remove_join_expressions(
(_, Some(rr)) => Ok(Some(rr)),
_ => Ok(None),
}
+ },
+ // Fix for issue#78 join predicates from inside of OR expr also
pulled up properly.
+ Operator::Or => {
Review Comment:
In general, I think this kind of rewrite should be done in an SQL optimizer
pass so that it will apply to any query plan (e.g. that came from the DataFrame
API) rather than only SQL
##########
datafusion/core/tests/sql/predicates.rs:
##########
@@ -427,11 +427,10 @@ async fn multiple_or_predicates() -> Result<()> {
let expected =vec![
"Explain [plan_type:Utf8, plan:Utf8]",
" Projection: #lineitem.l_partkey [l_partkey:Int64]",
- " Projection: #part.p_partkey = #lineitem.l_partkey AS
BinaryExpr-=Column-lineitem.l_partkeyColumn-part.p_partkey,
#lineitem.l_partkey, #lineitem.l_quantity, #part.p_brand, #part.p_size
[BinaryExpr-=Column-lineitem.l_partkeyColumn-part.p_partkey:Boolean;N,
l_partkey:Int64, l_quantity:Float64, p_brand:Utf8, p_size:Int32]",
- " Filter: #part.p_partkey = #lineitem.l_partkey AND #part.p_brand
= Utf8(\"Brand#12\") AND #lineitem.l_quantity >= Int64(1) AND
#lineitem.l_quantity <= Int64(11) AND #part.p_size BETWEEN Int64(1) AND
Int64(5) OR #part.p_brand = Utf8(\"Brand#23\") AND #lineitem.l_quantity >=
Int64(10) AND #lineitem.l_quantity <= Int64(20) AND #part.p_size BETWEEN
Int64(1) AND Int64(10) OR #part.p_brand = Utf8(\"Brand#34\") AND
#lineitem.l_quantity >= Int64(20) AND #lineitem.l_quantity <= Int64(30) AND
#part.p_size BETWEEN Int64(1) AND Int64(15) [l_partkey:Int64,
l_quantity:Float64, p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
- " CrossJoin: [l_partkey:Int64, l_quantity:Float64,
p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
- " TableScan: lineitem projection=[l_partkey, l_quantity]
[l_partkey:Int64, l_quantity:Float64]",
- " TableScan: part projection=[p_partkey, p_brand, p_size]
[p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
+ " Filter: #part.p_brand = Utf8(\"Brand#12\") AND
#lineitem.l_quantity >= Int64(1) AND #lineitem.l_quantity <= Int64(11) AND
#part.p_size BETWEEN Int64(1) AND Int64(5) OR #part.p_brand =
Utf8(\"Brand#23\") AND #lineitem.l_quantity >= Int64(10) AND
#lineitem.l_quantity <= Int64(20) AND #part.p_size BETWEEN Int64(1) AND
Int64(10) OR #part.p_brand = Utf8(\"Brand#34\") AND #lineitem.l_quantity >=
Int64(20) AND #lineitem.l_quantity <= Int64(30) AND #part.p_size BETWEEN
Int64(1) AND Int64(15) [l_partkey:Int64, l_quantity:Float64, p_partkey:Int64,
p_brand:Utf8, p_size:Int32]",
Review Comment:
This is definitely the better plan 👍 @DhamoPS
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]