Daniel Darabos created SPARK-23666:

             Summary: Undeterministic column name with UDFs
                 Key: SPARK-23666
                 URL: https://issues.apache.org/jira/browse/SPARK-23666
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0, 2.2.0
            Reporter: Daniel Darabos

When you access structure fields in Spark SQL, the auto-generated result column 
name includes an internal ID.
scala> import spark.implicits._
scala> Seq(((1, 2), 3)).toDF("a", "b").createOrReplaceTempView("x")
scala> spark.udf.register("f", (a: Int) => a)
scala> spark.sql("select f(a._1) from x").show
|UDF:f(a._1 AS _1#148)|
|                    1|
This ID ({{#148}}) is only included for UDFs.
scala> spark.sql("select factorial(a._1) from x").show
|factorial(a._1 AS `_1`)|
|                      1|
The internal ID is different on every invocation. The problem this causes for 
us is that the schema of the SQL output is never the same:
scala> spark.sql("select f(a._1) from x").schema ==
       spark.sql("select f(a._1) from x").schema
Boolean = false
We rely on similar schema checks when reloading persisted data.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to