i have gotten used to spark always returning a WrappedArray for Seq. at
some point i think i even read this was guaranteed to be the case. not sure
if it still is...

in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:

scala> val x = Seq((1,2),(1,3)).toDF
x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala>
x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3",
udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+-------------------------------------------------+
|_1 |_3    |class_of_3                                       |
+---+------+-------------------------------------------------+
|1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
+---+------+-------------------------------------------------+

but when i build current master with scala 2.13 i get:

scala> val x = Seq((1,2),(1,3)).toDF
warning: 1 deprecation (since 2.13.3); for details, enable `:setting
-deprecation' or `:replay -deprecation'
val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala>
x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class",
udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+---------------------------------------------+
|_1 |_3    |class                                        |
+---+------+---------------------------------------------+
|1  |[2, 3]|class scala.collection.immutable.$colon$colon|
+---+------+---------------------------------------------+

i am curious if we are planning on returning immutable Seq going forward
(which is nice)? and if so is List the best choice? i was sort of guessing
it would be an immutable ArraySeq perhaps (given it provides efficient ways
to wrap an array and access the underlying array)?

best

Reply via email to