Saurabh Chawla created SPARK-37596:
--------------------------------------
Summary: Add the support for struct type column in the
DropDuplicate in spark
Key: SPARK-37596
URL: https://issues.apache.org/jira/browse/SPARK-37596
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.2.0
Reporter: Saurabh Chawla
Add the support for struct type coulmn in the DropDuplicate in spark
Currently on using the struct col in the DropDuplicate we will get the below
exception
case class StructDropDup(c1: Int, c2: Int)
val df = Seq(("d1", StructDropDup(1, 2)),
("d1", StructDropDup(1, 2))).toDF("a", "b")
df.dropDuplicates("b.c1")
{code:java}
org.apache.spark.sql.AnalysisException: Cannot resolve column name "b.c1" among
(a, b)
at org.apache.spark.sql.Dataset.$anonfun$dropDuplicates$1(Dataset.scala:2576)
at
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429){code}
As workAround inorder to find the the duplicate using the struct column
df1.withColumn("b.c1", col("b.c1")).dropDuplicates("b.c1").drop("b.c1").collect
{code:java}
df.withColumn("b.c1", col("b.c1")).dropDuplicates("b.c1").drop("b.c1").collect
res25: Array[org.apache.spark.sql.Row] = Array([d1,[1,2]]){code}
There is need to add the support for the dropDuplicates
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]