[
https://issues.apache.org/jira/browse/SPARK-17897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706086#comment-15706086
]
Xiao Li commented on SPARK-17897:
---------------------------------
My first PR does not cover all the cases. Found the root cause.
The `constraints` of an operator is the expressions that evaluate to `true` for
all the rows produced. That means, the expression result should be neither
`false` nor `unknown` (NULL). Thus, we can conclude that `IsNotNull` on all the
constraints, which are generated by its own predicates or propagated from the
children. The constraint can be a complex expression. For better usage of these
constraints, we try to push down `IsNotNull` to the lowest-level expressions.
`IsNotNull` can be pushed through an expression when it is null intolerant.
(When the input is NULL, the null-intolerant expression always evaluates to
null.)
Below is the code we have for `IsNotNull` pushdown.
{noformat}
private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr
match {
case a: Attribute => Seq(a)
case _: NullIntolerant | IsNotNull(_: NullIntolerant) =>
expr.children.flatMap(scanNullIntolerantExpr)
case _ => Seq.empty[Attribute]
}
{noformat}
`IsNotNull` is not null-intolerant. It converts `null` to `false`. If there
does not exist any `Not`-like expression, it works; otherwise, it could
generate a wrong result. The above function needs to be corrected to
{noformat}
private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr
match {
case a: Attribute => Seq(a)
case _: NullIntolerant => expr.children.flatMap(scanNullIntolerantExpr)
case _ => Seq.empty[Attribute]
}
{noformat}
This fixes the problem, but we need a smarter fix for avoiding regressions.
> not isnotnull is converted to the always false condition isnotnull && not
> isnotnull
> -----------------------------------------------------------------------------------
>
> Key: SPARK-17897
> URL: https://issues.apache.org/jira/browse/SPARK-17897
> Project: Spark
> Issue Type: Bug
> Components: Optimizer
> Affects Versions: 2.0.0, 2.0.1
> Reporter: Jordan Halterman
> Labels: correctness
>
> When a logical plan is built containing the following somewhat nonsensical
> filter:
> {{Filter (NOT isnotnull($f0#212))}}
> During optimization the filter is converted into a condition that will always
> fail:
> {{Filter (isnotnull($f0#212) && NOT isnotnull($f0#212))}}
> This appears to be caused by the following check for {{NullIntolerant}}:
> https://github.com/apache/spark/commit/df68beb85de59bb6d35b2a8a3b85dbc447798bf5#diff-203ac90583cebe29a92c1d812c07f102R63
> Which recurses through the expression and extracts nested {{IsNotNull}}
> calls, converting them to {{IsNotNull}} calls on the attribute at the root
> level:
> https://github.com/apache/spark/commit/df68beb85de59bb6d35b2a8a3b85dbc447798bf5#diff-203ac90583cebe29a92c1d812c07f102R49
> This results in the nonsensical condition above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]