[ 
https://issues.apache.org/jira/browse/SPARK-17897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706086#comment-15706086
 ] 

Xiao Li commented on SPARK-17897:
---------------------------------

My first PR does not cover all the cases. Found the root cause. 

The `constraints` of an operator is the expressions that evaluate to `true` for 
all the rows produced. That means, the expression result should be neither 
`false` nor `unknown` (NULL). Thus, we can conclude that `IsNotNull` on all the 
constraints, which are generated by its own predicates or propagated from the 
children. The constraint can be a complex expression. For better usage of these 
constraints, we try to push down `IsNotNull` to the lowest-level expressions. 
`IsNotNull` can be pushed through an expression when it is null intolerant. 
(When the input is NULL, the null-intolerant expression always evaluates to 
null.)

Below is the code we have for `IsNotNull` pushdown. 

{noformat}
  private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr 
match {
    case a: Attribute => Seq(a)
    case _: NullIntolerant | IsNotNull(_: NullIntolerant) =>
      expr.children.flatMap(scanNullIntolerantExpr)
    case _ => Seq.empty[Attribute]
  }
{noformat}

`IsNotNull` is not null-intolerant. It converts `null` to `false`. If there 
does not exist any `Not`-like expression, it works; otherwise, it could 
generate a wrong result. The above function needs to be corrected to 

{noformat}
  private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr 
match {
    case a: Attribute => Seq(a)
    case _: NullIntolerant => expr.children.flatMap(scanNullIntolerantExpr)
    case _ => Seq.empty[Attribute]
  }
{noformat}

This fixes the problem, but we need a smarter fix for avoiding regressions. 

> not isnotnull is converted to the always false condition isnotnull && not 
> isnotnull
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-17897
>                 URL: https://issues.apache.org/jira/browse/SPARK-17897
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Jordan Halterman
>              Labels: correctness
>
> When a logical plan is built containing the following somewhat nonsensical 
> filter:
> {{Filter (NOT isnotnull($f0#212))}}
> During optimization the filter is converted into a condition that will always 
> fail:
> {{Filter (isnotnull($f0#212) && NOT isnotnull($f0#212))}}
> This appears to be caused by the following check for {{NullIntolerant}}:
> https://github.com/apache/spark/commit/df68beb85de59bb6d35b2a8a3b85dbc447798bf5#diff-203ac90583cebe29a92c1d812c07f102R63
> Which recurses through the expression and extracts nested {{IsNotNull}} 
> calls, converting them to {{IsNotNull}} calls on the attribute at the root 
> level:
> https://github.com/apache/spark/commit/df68beb85de59bb6d35b2a8a3b85dbc447798bf5#diff-203ac90583cebe29a92c1d812c07f102R49
> This results in the nonsensical condition above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to