[
https://issues.apache.org/jira/browse/SPARK-46671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Asif updated SPARK-46671:
-------------------------
Affects Version/s: 4.0.0
> InferFiltersFromConstraint rule is creating a redundant filter
> --------------------------------------------------------------
>
> Key: SPARK-46671
> URL: https://issues.apache.org/jira/browse/SPARK-46671
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.0, 4.0.0
> Reporter: Asif
> Priority: Minor
> Labels: SQL, catalyst
>
> while bring my old PR which uses a different approach to the
> ConstraintPropagation algorithm (
> [SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch
> with current master, I noticed a test failure in my branch for SPARK-33152:
> The test which is failing is
> InferFiltersFromConstraintSuite:
> {code}
> test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch:
> Infer Filters") {
> val x = testRelation.as("x")
> val y = testRelation.as("y")
> val z = testRelation.as("z")
> // Removes EqualNullSafe when constructing candidate constraints
> comparePlans(
> InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
> .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
> x.select($"x.a", $"x.a".as("xa"))
> .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" &&
> $"xa" === $"x.a").analyze)
> // Once strategy's idempotence is not broken
> val originalQuery =
> x.join(y, condition = Some($"x.a" === $"y.a"))
> .select($"x.a", $"x.a".as("xa")).as("xy")
> .join(z, condition = Some($"xy.a" === $"z.a")).analyze
> val correctAnswer =
> x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition =
> Some($"x.a" === $"y.a"))
> .select($"x.a", $"x.a".as("xa")).as("xy")
> .join(z.where($"a".isNotNull), condition = Some($"xy.a" ===
> $"z.a")).analyze
> val optimizedQuery = InferFiltersFromConstraints(originalQuery)
> comparePlans(optimizedQuery, correctAnswer)
> comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer)
> }
> {code}
> In the above test, I believe the below assertion is not proper.
> There is a redundant filter which is getting created.
> Out of these two isNotNull constraints, only one should be created.
> $"xa".isNotNull && $"x.a".isNotNull
> Because "xa" is an alias of x."a" , so only one isNullConstraint is needed.
> // Removes EqualNullSafe when constructing candidate constraints
> comparePlans(
> InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
> .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
> x.select($"x.a", $"x.a".as("xa"))
> .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" &&
> $"xa" === $"x.a").analyze)
> This is not a big issue, but it highlights the need to take a relook at the
> code of ConstraintPropagation and related code.
> I am filing this jira so that constraint code can be tightened/made more
> robust.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]