[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546058#comment-17546058 ]
Enrico Minack commented on SPARK-39292: --------------------------------------- This is being fixed as part of https://issues.apache.org/jira/browse/SPARK-39292 > Make Dataset.melt work with struct fields > ----------------------------------------- > > Key: SPARK-39292 > URL: https://issues.apache.org/jira/browse/SPARK-39292 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: Enrico Minack > Priority: Major > > In SPARK-38864, the melt function was added to Dataset. > It would be nice if fields of struct fields could be used as id and value > columns. This would allow for the following: > Given a Dataset with following schema: > {code:java} > root > |-- an: struct (nullable = false) > | |-- id: integer (nullable = false) > |-- str: struct (nullable = false) > | |-- one: string (nullable = true) > | |-- two: string (nullable = true) > {code} > For example: > {code:java} > +---+-------------+ > | an| str| > +---+-------------+ > |{1}| {one, One}| > |{2}| {two, null}| > |{3}|{null, three}| > |{4}| {null, null}| > +---+-------------+ > {code} > Melting with value columns {{Seq("str.one", "str.two")}} on id columns > {{Seq("an.id")}} would result in > {code:java} > +--+--------+-----+ > |an|variable|value| > +--+--------+-----+ > | 1| str.one| one| > | 1| str.two| One| > | 2| str.one| two| > | 2| str.two| null| > | 3| str.one| null| > | 3| str.two|three| > | 4| str.one| null| > | 4| str.two| null| > +--+--------+-----+ > {code} > See test in {{org.apache.spark.sql.MeltSuite}}: > {code:java} > test("SPARK-39292: melt with struct fields") { > val df = meltWideDataDs.select( > struct($"id").as("an"), > struct( > $"str1".as("one"), > $"str2".as("two") > ).as("str") > ) > checkAnswer( > Melt.of(df, Seq("an.id"), Seq("str.one", "str.two"), false, "variable", > "value"), > meltedWideDataRows.map(row => Row( > row.getInt(0), > row.getString(1) match { > case "str1" => "str.one" > case "str2" => "str.two" > }, > row.getString(2) > )) > ) > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org