Saurabh Chawla created SPARK-37596:
--------------------------------------

             Summary: Add the support for struct type column in the 
DropDuplicate in spark
                 Key: SPARK-37596
                 URL: https://issues.apache.org/jira/browse/SPARK-37596
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Saurabh Chawla


Add the support for struct type coulmn in the DropDuplicate in spark 


Currently on using the struct col in the DropDuplicate we will get the below 
exception

case class StructDropDup(c1: Int, c2: Int)

val df = Seq(("d1", StructDropDup(1, 2)),
      ("d1", StructDropDup(1, 2))).toDF("a", "b")

df.dropDuplicates("b.c1")
{code:java}
org.apache.spark.sql.AnalysisException: Cannot resolve column name "b.c1" among 
(a, b)
  at org.apache.spark.sql.Dataset.$anonfun$dropDuplicates$1(Dataset.scala:2576)
  at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
  at scala.collection.Iterator.foreach(Iterator.scala:941)
  at scala.collection.Iterator.foreach$(Iterator.scala:941)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1429){code}
As workAround inorder to find the the duplicate using the struct column

df1.withColumn("b.c1", col("b.c1")).dropDuplicates("b.c1").drop("b.c1").collect
{code:java}
df.withColumn("b.c1", col("b.c1")).dropDuplicates("b.c1").drop("b.c1").collect
res25: Array[org.apache.spark.sql.Row] = Array([d1,[1,2]]){code}
There is need to add the support for the dropDuplicates



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to