[
https://issues.apache.org/jira/browse/SPARK-22431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249505#comment-16249505
]
Sunitha Kambhampati commented on SPARK-22431:
---------------------------------------------
*Observations:*
I ran a few tests with the STRUCT containing a `$a` and for the following
scenarios:
a) create table, b) create view, c) create datasource table against the hive
and in-memory catalog.
*A. Hive Catalog*
+1. Create Table (CTAS) - illegal type+ - Results in Error
{code:java}
spark-sql> CREATE TABLE t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q;
17/11/10 22:50:45 WARN ObjectStore: Failed to get database global_temp,
returning NoSuchObjectException
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Error: name expected at the position 7 of
'struct<$a:string,b:int>' but '$' is found.;
{code}
+2 Create Table – illegal type+ - Results in Error
{code:java}
CREATE TABLE t(q STRUCT<`$a`:INT,col2:STRING>, i1 INT);
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Error: name expected at the position 7 of
'struct<$a:int,col2:string>:int' but '$' is found.;
{code}
+3 Create DataSourceTable - illegal type+ -
Try to store it in hive compatible way if possible, if this fails, then it
tries to store the metadata in Spark SQL format. This is successful
With Parquet, there is error trying to store in hive compatible way so it falls
back to persisting the metadata in Spark SQL specific format.
{code:java}
CREATE TABLE t(q STRUCT<`$a`:INT,col2:STRING>, i1 INT) USING PARQUET;
17/11/10 22:52:40 WARN HiveExternalCatalog: Could not persist `default`.`t` in
a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific
format.
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Error: name expected at the position 7 of
'struct<$a:int,col2:string>:int' but '$' is found.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
Caused by: java.lang.IllegalArgumentException: Error: name expected at the
position 7 of 'struct<$a:int,col2:string>:int' but '$' is found.
{code}
Retrieving the table metadata – OK
{code:java}
select * from t;
Time taken: 0.912 seconds
spark-sql> describe formatted t;
q struct<$a:int,col2:string> NULL
i1 int NULL
# Detailed Table Information
Database default
Table t
Owner ksunitha
Created Time Fri Nov 10 22:52:40 IST 2017
Last Access Thu Jan 01 05:30:00 IST 1970
Created By Spark 2.3.0-SNAPSHOT
Type MANAGED
Provider PARQUET
Table Properties [transient_lastDdlTime=1510334560]
Location file:/Users/ksunitha/projects/trunk/spark/spark-warehouse/t
Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Storage Properties [serialization.format=1]
Time taken: 0.071 seconds, Fetched 18 row(s)
{code}
+4. Create View - illegal type+
Creation successful.
Retrieving the view metadata – fails, so select, drop fail.
{code:java}
CREATE VIEW t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q;
Time taken: 0.036 seconds
spark-sql> select * from t;
17/11/10 22:57:22 ERROR SparkSQLDriver: Failed in [select * from t]
org.apache.spark.SparkException: Cannot recognize hive type string:
struct<$a:string,b:int>
{code}
--
*B. InMemoryCatalog*
+1.Create Table - illegal type+
N/A – Hive Support is needed
+2. Create DataSourceTable - illegal type+
OK/Successful
{code:java}
CREATE TABLE t(q STRUCT<`$a`:INT,col2:STRING>, i1 INT) USING PARQUET
{code}
Retrieving the table metadata – Select Query – OK
+3. Create View - illegal type+
Creation successful.
{code:java}
CREATE VIEW t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q
{code}
Retrieving the view metadata and select query – OK
----
*Cause:*
# When you store the table metadata with provider as Hive, then it stores the
schema that has a struct with illegal name in the catalog.
a. If *table metadata* is being stored, the underlying serde
initialization catches any illegal parameters and throws an exception.
b. If *view metadata* is being stored, then hive has some special case
logic and will not trigger any checks on the illegal parameters so the metadata
for the view gets stored(/created) in the Hive metastore. But when retrieving a
hive table’s metadata, Spark will make sure that it can read the schema and it
is compatible with Spark and so when it retrieves the information from the Hive
Metastore, it checks that the column datatype can be parsed by Spark. Spark is
not able to parse the struct with the illegal name because it is not quoted
(backticked).
# When you store the table metadata for a non-hive provider, then there are no
checks done which is why the create and the select statements work fine.
# CatalystSqlParser.parseDataType cannot parse the schema when the struct has
an illegal name without the backtick.
As the tests show, there are differences in the behavior.
*Some possible solutions:*
# Fix the hive table/view codepath to check whether the schema datatype is
parseable by Spark before persisting it in the metastore. This change is
localized to HiveClientImpl to do the check similar to the check in
FromHiveColumn. This is fail-fast and we will avoid the scenario where we
write something to the metastore that we are unable to read it back ( ie the
scenario in this jira description)
# Improve storing and handling of the nested datatypes that have quoted
(backtick identifiers) – This would need more changes after more
investigation/input.
# The tests show that there are differences in the behavior when using the
in-memory versus the hive catalog. Resolve inconsistent behavior ?
What do we want the desired behavior to be ? Please share your
thoughts/comments. Thanks.
> Creating Permanent view with illegal type
> -----------------------------------------
>
> Key: SPARK-22431
> URL: https://issues.apache.org/jira/browse/SPARK-22431
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Herman van Hovell
>
> It is possible in Spark SQL to create a permanent view that uses an nested
> field with an illegal name.
> For example if we create the following view:
> {noformat}
> create view x as select struct('a' as `$q`, 1 as b) q
> {noformat}
> A simple select fails with the following exception:
> {noformat}
> select * from x;
> org.apache.spark.SparkException: Cannot recognize hive type string:
> struct<$q:string,b:int>
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:812)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
> ...
> {noformat}
> Dropping the view isn't possible either:
> {noformat}
> drop view x;
> org.apache.spark.SparkException: Cannot recognize hive type string:
> struct<$q:string,b:int>
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:812)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]