Terry Kim created SPARK-30065:
---------------------------------
Summary: Unable to drop na with duplicate columns
Key: SPARK-30065
URL: https://issues.apache.org/jira/browse/SPARK-30065
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim
Trying to drop rows with null values fails even when no columns are specified.
This should be allowed:
{code:java}
scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
left: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
right: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
scala> val df = left.join(right, Seq("col1"))
df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more
field]
scala> df.show
+----+----+----+
|col1|col2|col2|
+----+----+----+
| 1|null| 2|
| 3| 4|null|
+----+----+----+
scala> df.na.drop("any")
org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could
be: col2, col2.;
at
org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]