Daniel Shields created SPARK-24385: -------------------------------------- Summary: Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join Key: SPARK-24385 URL: https://issues.apache.org/jira/browse/SPARK-24385 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0, 2.2.1 Reporter: Daniel Shields
Dataset.join(right: Dataset[_], joinExprs: Column, joinType: String) has special logic for resolving trivially-true predicates to both sides. It currently handles regular equals but not null-safe equals; the code should be updated to also handle null-safe equals. Pyspark example: {code:java} df = spark.range(10) df.join(df, 'id').collect() # This works. df.join(df, df['id'] == df['id']).collect() # This works. df.join(df, df['id'].eqNullSafe(df['id'])).collect() # This fails!!! # This is a workaround that works. df2 = df.withColumn('id', F.col('id')) df.join(df2, df['id'].eqNullSafe(df2['id'])).collect(){code} The relevant code in Dataset.join should look like this: {code:java} // Otherwise, find the trivially true predicates and automatically resolves them to both sides. // By the time we get here, since we have already run analysis, all attributes should've been // resolved and become AttributeReference. val cond = plan.condition.map { _.transform { case catalyst.expressions.EqualTo(a: AttributeReference, b: AttributeReference) if a.sameRef(b) => catalyst.expressions.EqualTo( withPlan(plan.left).resolve(a.name), withPlan(plan.right).resolve(b.name)) // This case is new!!! case catalyst.expressions.EqualNullSafe(a: AttributeReference, b: AttributeReference) if a.sameRef(b) => catalyst.expressions.EqualNullSafe( withPlan(plan.left).resolve(a.name), withPlan(plan.right).resolve(b.name)) }} {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org