Asif created SPARK-46671:
----------------------------
Summary: InferFiltersFromConstraint rule is creating a redundant
filter
Key: SPARK-46671
URL: https://issues.apache.org/jira/browse/SPARK-46671
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.0
Reporter: Asif
while bring my old PR which uses a different approach to the
ConstraintPropagation algorithm (
[SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch with
current master, I noticed a test failure in my branch for SPARK-33152:
The test which is failing is
InferFiltersFromConstraintSuite:
{code}
test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch:
Infer Filters") {
val x = testRelation.as("x")
val y = testRelation.as("y")
val z = testRelation.as("z")
// Removes EqualNullSafe when constructing candidate constraints
comparePlans(
InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
.where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
x.select($"x.a", $"x.a".as("xa"))
.where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa"
=== $"x.a").analyze)
// Once strategy's idempotence is not broken
val originalQuery =
x.join(y, condition = Some($"x.a" === $"y.a"))
.select($"x.a", $"x.a".as("xa")).as("xy")
.join(z, condition = Some($"xy.a" === $"z.a")).analyze
val correctAnswer =
x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition =
Some($"x.a" === $"y.a"))
.select($"x.a", $"x.a".as("xa")).as("xy")
.join(z.where($"a".isNotNull), condition = Some($"xy.a" ===
$"z.a")).analyze
val optimizedQuery = InferFiltersFromConstraints(originalQuery)
comparePlans(optimizedQuery, correctAnswer)
comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer)
}
{code}
In the above test, I believe the below assertion is not proper.
There is a redundant filter which is getting created.
Out of these two isNotNull constraints, only one should be created.
$"xa".isNotNull && $"x.a".isNotNull
Because presence of (xa#0 = a#0), automatically implies that is one attribute
is not null, the other also has to be not null.
// Removes EqualNullSafe when constructing candidate constraints
comparePlans(
InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
.where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
x.select($"x.a", $"x.a".as("xa"))
.where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa"
=== $"x.a").analyze)
This is not a big issue, but it highlights the need to take a relook at the
code of ConstraintPropagation and related code.
I am filing this jira so that constraint code can be tightened/made more robust.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]