Simeon Simeonov created SPARK-14048:
---------------------------------------

             Summary: Aggregation operations on structs fail when the structs 
have fields with special characters
                 Key: SPARK-14048
                 URL: https://issues.apache.org/jira/browse/SPARK-14048
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.0
         Environment: Databricks w/ 1.6.0
            Reporter: Simeon Simeonov


Consider a schema where a struct has field names with special characters, e.g.,

{code}
 |-- st: struct (nullable = true)
 |    |-- x.y: long (nullable = true)
{code}

Schema such as these are frequently generated by the JSON schema generator, 
which seems to never want to map JSON data to {{MapType}} always preferring to 
use {{StructType}}. 

In SparkSQL, referring to these fields requires backticks, e.g., {{st.`x.y`}}. 
There is no problem manipulating these structs unless one is using an 
aggregation function. It seems that, under the covers, the code is not escaping 
fields with special characters correctly.

For example, 

{code}
select first(st) as st from tbl group by something
{code}

generates

{code}
org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType: 
struct<x.y:bigint>. If you have a struct and a field name of it has any special 
characters, please use backticks (`) to quote that field name, e.g. `x+y`. 
Please note that backtick itself is not supported in a field name.
  at 
org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
  at 
org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
  at 
org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
  at 
org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
  at 
com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
  at 
com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
  at scala.collection.AbstractTraversable.map(Traversable.scala:105)
  at 
com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
  at 
com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
  at 
com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
  at 
com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
  at 
com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
  at 
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
  at 
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
  at 
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
  at scala.util.Try$.apply(Try.scala:161)
  at 
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
  at 
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
  at 
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
  at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to