Hi, I ma having issues trying to rename or move subcolumns when they are insdie a repeated structure.
Given a certain schema, I can create a different layout to provide an alternative view. For exaple, I can move one column and put it inside a subcolumn, and add an extra literal field, just for fun: import org.apache.spark.sql.{DataFrame, Column} import org.apache.spark.sql.functions import sqlContext.implicits._ case class Level0ArrayStruct( level_0_array_a: String, level_0_array_b: Int) case class Level1ArrayStruct( level_1_array_a: String, level_1_array_b: Int) case class Level1Struct( level_1_a: String, level_1_b: Int) case class Level0Struct( level_0_a: String, level_0_b: Int, level_0_array: Seq[Level0ArrayStruct], level_0_struct: Level1Struct) val example = sc.parallelize( Seq(Level0Struct( "level 0 a", 0, Seq( Level0ArrayStruct("level 0 array a 1", 1), Level0ArrayStruct("level 0 array a 2", 2)), Level1Struct( "level 1 a", 3)))).toDF *scala> example.printSchema*root |-- level_0_a: string (nullable = true) |-- level_0_b: integer (nullable = false) |-- level_0_array: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- level_0_array_a: string (nullable = true) | | |-- level_0_array_b: integer (nullable = false) |-- level_0_struct: struct (nullable = true) | |-- level_1_a: string (nullable = true) | |-- level_1_b: integer (nullable = false) *scala> example.withColumn("level_0_struct", functions.struct($"level_0_struct.level_1_a", $"level_0_struct.level_1_b", $"level_0_b", functions.lit("foo").as("foo"))).drop("level_0_b").printSchema*root |-- level_0_a: string (nullable = true) |-- level_0_array: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- level_0_array_a: string (nullable = true) | | |-- level_0_array_b: integer (nullable = false) |-- level_0_struct: struct (nullable = false) | |-- level_1_a: string (nullable = true) | |-- level_1_b: integer (nullable = true) * | |-- level_0_b: integer (nullable = false) | |-- foo: string (nullable = false)* However, I don't find a way to reliably deal with the struct inside level_0_array. If I try to move any of its fields to anywhere (including that array column) they become an array column themselves, and I don't know how to reassemble ("zip") them together in a struct. Say I want to add the same literal "foo", but this time inside level_0_array , for all the rows there. The resulting schema would be: scala> example.withColumn("level_0_array", functions.struct($"level_0_array.level_0_array_a", $"level_0_array.level_0_array_b", functions.lit("foo").as("foo"))).printSchema root |-- level_0_a: string (nullable = true) |-- level_0_b: integer (nullable = false) |-- level_0_array: struct (nullable = false) * | |-- level_0_array_a: array (nullable = true) | | |-- element: string (containsNull = true) | |-- level_0_array_b: array (nullable = true) | | |-- element: integer (containsNull = true) | |-- foo: string (nullable = false)* |-- level_0_struct: struct (nullable = true) | |-- level_1_a: string (nullable = true) | |-- level_1_b: integer (nullable = false) The same problem applies if I tried to rename the fields, they become array columns. Is there any way to recursively manipulate repeated columns without completely breaking their structure into individually repeated fields? Best -- Samuel