Daniel Shields created SPARK-24385:
--------------------------------------

             Summary: Trivially-true EqualNullSafe should be handled like 
EqualTo in Dataset.join
                 Key: SPARK-24385
                 URL: https://issues.apache.org/jira/browse/SPARK-24385
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0, 2.2.1
            Reporter: Daniel Shields


Dataset.join(right: Dataset[_], joinExprs: Column, joinType: String) has 
special logic for resolving trivially-true predicates to both sides. It 
currently handles regular equals but not null-safe equals; the code should be 
updated to also handle null-safe equals.

Pyspark example:
{code:java}
df = spark.range(10)
df.join(df, 'id').collect() # This works.
df.join(df, df['id'] == df['id']).collect() # This works.
df.join(df, df['id'].eqNullSafe(df['id'])).collect() # This fails!!!

# This is a workaround that works.
df2 = df.withColumn('id', F.col('id'))
df.join(df2, df['id'].eqNullSafe(df2['id'])).collect(){code}
The relevant code in Dataset.join should look like this:
{code:java}
// Otherwise, find the trivially true predicates and automatically resolves 
them to both sides.
// By the time we get here, since we have already run analysis, all attributes 
should've been
// resolved and become AttributeReference.
val cond = plan.condition.map { _.transform {
  case catalyst.expressions.EqualTo(a: AttributeReference, b: 
AttributeReference) if a.sameRef(b) =>
    catalyst.expressions.EqualTo(
      withPlan(plan.left).resolve(a.name),
      withPlan(plan.right).resolve(b.name))
  // This case is new!!!
  case catalyst.expressions.EqualNullSafe(a: AttributeReference, b: 
AttributeReference) if a.sameRef(b) =>
    catalyst.expressions.EqualNullSafe(
      withPlan(plan.left).resolve(a.name),
      withPlan(plan.right).resolve(b.name))
}}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to