[jira] [Created] (SPARK-37596) Add the support for struct type column in the DropDuplicate in spark

Saurabh Chawla (Jira) Thu, 09 Dec 2021 03:06:43 -0800

Saurabh Chawla created SPARK-37596:
--------------------------------------

             Summary: Add the support for struct type column in the 
DropDuplicate in spark
                 Key: SPARK-37596
                 URL: https://issues.apache.org/jira/browse/SPARK-37596
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Saurabh Chawla



Add the support for struct type coulmn in the DropDuplicate in spark 


Currently on using the struct col in the DropDuplicate we will get the below 
exception

case class StructDropDup(c1: Int, c2: Int)

val df = Seq(("d1", StructDropDup(1, 2)),
      ("d1", StructDropDup(1, 2))).toDF("a", "b")

df.dropDuplicates("b.c1")
{code:java}
org.apache.spark.sql.AnalysisException: Cannot resolve column name "b.c1" among 
(a, b)
  at org.apache.spark.sql.Dataset.$anonfun$dropDuplicates$1(Dataset.scala:2576)
  at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
  at scala.collection.Iterator.foreach(Iterator.scala:941)
  at scala.collection.Iterator.foreach$(Iterator.scala:941)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1429){code}
As workAround inorder to find the the duplicate using the struct column

df1.withColumn("b.c1", col("b.c1")).dropDuplicates("b.c1").drop("b.c1").collect
{code:java}
df.withColumn("b.c1", col("b.c1")).dropDuplicates("b.c1").drop("b.c1").collect
res25: Array[org.apache.spark.sql.Row] = Array([d1,[1,2]]){code}
There is need to add the support for the dropDuplicates



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-37596) Add the support for struct type column in the DropDuplicate in spark

Reply via email to