Michal Kielbowicz created SPARK-17335:
-----------------------------------------
Summary: Creating Hive table from Spark data
Key: SPARK-17335
URL: https://issues.apache.org/jira/browse/SPARK-17335
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.0
Reporter: Michal Kielbowicz
Recently my team started using Spark for analysis of huge JSON objects. Spark
itself handles it well. The problem starts when we try to create a Hive table
from it using steps from this part of doc:
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
After running command `spark.sql("CREATE TABLE x AS (SELECT * FROM y)") we get
following exception (sorry for obfuscating, confidential data):
```
Exception in thread "main" org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Error: : expected at the position 993 of
'string:struct<a:boolean,b:array<string>,c:boolean,d:struct<e:boolean,f:boolean,[...(few
others)],z:boolean,**... 4 more fields**>,[...(rest of valid struct string)]>'
but ' ' is found.;
```
It turned out that the exception was raised because of `... 4 more fields` part
as it is not a valid representation of data structure. We believe this issue is
indirectly caused by this PR: https://github.com/apache/spark/pull/13537
An easy workaround is to set `spark.debug.maxToStringFields` to some large
value. Nevertheless it shouldn't be required and the stringifying process
should use methods targeted at giving valid data structure for Hive.
In my opinion the root problem is here:
https://github.com/apache/spark/blob/9d7a47406ed538f0005cdc7a62bc6e6f20634815/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala#L318
when calling `simpleString` method instead of `catalogString`. Nevertheless
this class is used at many places and I don't feel that experienced with Spark
to automatically submit PR.
There has been almost the same issue in the past. You can find it here:
https://issues.apache.org/jira/browse/SPARK-16415
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]