i have gotten used to spark always returning a WrappedArray for Seq. at some point i think i even read this was guaranteed to be the case. not sure if it still is...
in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected: scala> val x = Seq((1,2),(1,3)).toDF x: org.apache.spark.sql.DataFrame = [_1: int, _2: int] scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false) +---+------+-------------------------------------------------+ |_1 |_3 |class_of_3 | +---+------+-------------------------------------------------+ |1 |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef| +---+------+-------------------------------------------------+ but when i build current master with scala 2.13 i get: scala> val x = Seq((1,2),(1,3)).toDF warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation' or `:replay -deprecation' val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int] scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false) +---+------+---------------------------------------------+ |_1 |_3 |class | +---+------+---------------------------------------------+ |1 |[2, 3]|class scala.collection.immutable.$colon$colon| +---+------+---------------------------------------------+ i am curious if we are planning on returning immutable Seq going forward (which is nice)? and if so is List the best choice? i was sort of guessing it would be an immutable ArraySeq perhaps (given it provides efficient ways to wrap an array and access the underlying array)? best