Simeon Simeonov created SPARK-14048:
---------------------------------------
Summary: Aggregation operations on structs fail when the structs
have fields with special characters
Key: SPARK-14048
URL: https://issues.apache.org/jira/browse/SPARK-14048
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.0
Environment: Databricks w/ 1.6.0
Reporter: Simeon Simeonov
Consider a schema where a struct has field names with special characters, e.g.,
{code}
|-- st: struct (nullable = true)
| |-- x.y: long (nullable = true)
{code}
Schema such as these are frequently generated by the JSON schema generator,
which seems to never want to map JSON data to {{MapType}} always preferring to
use {{StructType}}.
In SparkSQL, referring to these fields requires backticks, e.g., {{st.`x.y`}}.
There is no problem manipulating these structs unless one is using an
aggregation function. It seems that, under the covers, the code is not escaping
fields with special characters correctly.
For example,
{code}
select first(st) as st from tbl group by something
{code}
generates
{code}
org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType:
struct<x.y:bigint>. If you have a struct and a field name of it has any special
characters, please use backticks (`) to quote that field name, e.g. `x+y`.
Please note that backtick itself is not supported in a field name.
at
org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
at
org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
at
org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
at
org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
at
com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
at
com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
at
com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
at
com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
at
com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
at
com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
at scala.util.Try$.apply(Try.scala:161)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
at java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]