[ 
https://issues.apache.org/jira/browse/SPARK-22431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249505#comment-16249505
 ] 

Sunitha Kambhampati commented on SPARK-22431:
---------------------------------------------

*Observations:*
I ran a few tests with the STRUCT containing a `$a` and for the following 
scenarios:
a) create table, b) create view, c) create datasource table against the hive 
and in-memory catalog.   

*A. Hive Catalog*
+1. Create Table (CTAS) - illegal type+   - Results in Error
{code:java}
spark-sql> CREATE TABLE t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q;
17/11/10 22:50:45 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: name expected at the position 7 of 
'struct<$a:string,b:int>' but '$' is found.;
{code}

+2 Create Table – illegal type+  -      Results in Error

{code:java}
CREATE TABLE t(q STRUCT<`$a`:INT,col2:STRING>, i1 INT);
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: name expected at the position 7 of 
'struct<$a:int,col2:string>:int' but '$' is found.;
{code}

+3 Create DataSourceTable  - illegal type+  - 
Try to store it in hive compatible way if possible, if this fails, then it 
tries to  store the metadata in Spark SQL format.   This is successful
With Parquet, there is error trying to store in hive compatible way so it falls 
back to persisting the metadata in Spark SQL specific format. 
{code:java}
CREATE TABLE t(q STRUCT<`$a`:INT,col2:STRING>, i1 INT) USING PARQUET;   
17/11/10 22:52:40 WARN HiveExternalCatalog: Could not persist `default`.`t` in 
a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific 
format.
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: name expected at the position 7 of 
'struct<$a:int,col2:string>:int' but '$' is found.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
Caused by: java.lang.IllegalArgumentException: Error: name expected at the 
position 7 of 'struct<$a:int,col2:string>:int' but '$' is found.
{code}
Retrieving the table metadata –  OK

{code:java}
select * from t;
Time taken: 0.912 seconds
spark-sql> describe formatted t;
q       struct<$a:int,col2:string>      NULL
i1      int     NULL
                
# Detailed Table Information            
Database        default 
Table   t       
Owner   ksunitha        
Created Time    Fri Nov 10 22:52:40 IST 2017    
Last Access     Thu Jan 01 05:30:00 IST 1970    
Created By      Spark 2.3.0-SNAPSHOT    
Type    MANAGED 
Provider        PARQUET 
Table Properties        [transient_lastDdlTime=1510334560]      
Location        file:/Users/ksunitha/projects/trunk/spark/spark-warehouse/t     
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
InputFormat     org.apache.hadoop.mapred.SequenceFileInputFormat        
OutputFormat    org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat       
Storage Properties      [serialization.format=1]        
Time taken: 0.071 seconds, Fetched 18 row(s)
{code}

+4. Create View - illegal type+ 
Creation successful.
Retrieving the view metadata – fails, so select, drop fail.

{code:java}
CREATE VIEW t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q;
Time taken: 0.036 seconds
spark-sql> select * from t;
17/11/10 22:57:22 ERROR SparkSQLDriver: Failed in [select * from t]
org.apache.spark.SparkException: Cannot recognize hive type string: 
struct<$a:string,b:int>
{code}

--
*B. InMemoryCatalog*
+1.Create Table -  illegal type+
N/A – Hive Support is needed 

+2. Create DataSourceTable  - illegal type+
OK/Successful
{code:java}
CREATE TABLE t(q STRUCT<`$a`:INT,col2:STRING>, i1 INT) USING PARQUET
{code}
Retrieving the table metadata – Select Query – OK

+3. Create View - illegal type+
Creation successful.
{code:java}
CREATE VIEW t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q
{code}
Retrieving the view metadata and select query – OK 


----
*Cause:*
# When you store the table metadata with provider as Hive, then it stores the 
schema that has a struct with illegal name in the catalog. 
a.      If *table metadata* is being stored, the underlying serde 
initialization catches any illegal parameters and throws an exception. 
b.      If *view metadata* is being stored, then hive has some special case 
logic and will not trigger any checks on the illegal parameters so the metadata 
for the view gets stored(/created) in the Hive metastore. But when retrieving a 
hive table’s metadata, Spark will make sure that it can read the schema and it 
is compatible with Spark and so when it retrieves the information from the Hive 
Metastore, it checks that the column datatype can be parsed by Spark.  Spark is 
not able to parse the struct with the illegal name because it is not quoted 
(backticked). 
# When you store the table metadata for a non-hive provider, then there are no 
checks done which is why the create and the select statements work fine.  
# CatalystSqlParser.parseDataType cannot parse the schema when the struct has 
an illegal name without the backtick. 

As the tests show, there are differences in the behavior.  

*Some possible solutions:*
# Fix the hive table/view codepath to check whether the schema datatype is 
parseable by Spark before persisting it in the metastore. This change is 
localized to HiveClientImpl to do the check similar to the check in 
FromHiveColumn.  This is fail-fast and we will avoid the scenario where we 
write something to the metastore that we are unable to read it back ( ie the 
scenario in this jira description)
# Improve storing and handling of the nested datatypes that have quoted 
(backtick identifiers) – This would need more changes after more 
investigation/input.
# The tests show that there are differences in the behavior when using the 
in-memory versus the hive catalog.  Resolve inconsistent behavior ?

What do we want the desired behavior to be ?   Please share your 
thoughts/comments.  Thanks.

> Creating Permanent view with illegal type
> -----------------------------------------
>
>                 Key: SPARK-22431
>                 URL: https://issues.apache.org/jira/browse/SPARK-22431
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Herman van Hovell
>
> It is possible in Spark SQL to create a permanent view that uses an nested 
> field with an illegal name.
> For example if we create the following view:
> {noformat}
> create view x as select struct('a' as `$q`, 1 as b) q
> {noformat}
> A simple select fails with the following exception:
> {noformat}
> select * from x;
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct<$q:string,b:int>
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:812)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
> ...
> {noformat}
> Dropping the view isn't possible either:
> {noformat}
> drop view x;
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> struct<$q:string,b:int>
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:812)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to