[jira] [Created] (SPARK-20808) External Table unnecessarily not create in Hive-compatible way

Joachim Hereth (JIRA) Fri, 19 May 2017 02:14:15 -0700

Joachim Hereth created SPARK-20808:
--------------------------------------

             Summary: External Table unnecessarily not create in 
Hive-compatible way
                 Key: SPARK-20808
                 URL: https://issues.apache.org/jira/browse/SPARK-20808
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.1, 2.1.0
            Reporter: Joachim Hereth
            Priority: Minor



In Spark 2.1.0 and 2.1.1 {{spark.catalog.createExternalTable}} creates tables 
unnecessarily in a hive-incompatible way.

For instance executing in a spark shell

{code}
val database = "default"
val table = "table_name"
val path = "/user/daki/"  + database + "/" + table

var data = Array(("Alice", 23), ("Laura", 33), ("Peter", 54))
val df = sc.parallelize(data).toDF("name","age") 

df.write.mode(org.apache.spark.sql.SaveMode.Overwrite).parquet(path)

spark.sql("DROP TABLE IF EXISTS " + database + "." + table)

spark.catalog.createExternalTable(database + "."+ table, path)
{code}

issues the warning

{code}
Search Subject for Kerberos V5 INIT cred (<<DEF>>, 
sun.security.jgss.krb5.Krb5InitCredential)
17/05/19 11:01:17 WARN hive.HiveExternalCatalog: Could not persist 
`default`.`table_name` in a Hive compatible way. Persisting it into Hive 
metastore in Spark SQL specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:User 
daki does not have privileges for CREATETABLE)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
...
{code}

The Exception (user does not have privileges for CREATETABLE) is misleading (I 
do have the CREATE TABLE privilege).

Querying the table with Hive does not return any result. With Spark one can 
access the data.

The following code creates the table correctly (workaround):
{code}
def sqlStatement(df : org.apache.spark.sql.DataFrame, database : String, table: 
String, path: String) : String = {
  val rows = (for(col <- df.schema) 
                    yield "`" + col.name + "` " + 
col.dataType.simpleString).mkString(",\n")
  val sqlStmnt = ("CREATE EXTERNAL TABLE `%s`.`%s` (%s) " +
    "STORED AS PARQUET " +
    "Location 'hdfs://nameservice1%s'").format(database, table, rows, path)
  return sqlStmnt
}

spark.sql("DROP TABLE IF EXISTS " + database + "." + table)
spark.sql(sqlStatement(df, database, table, path))
{code}

The code is executed via YARN against a Cloudera CDH 5.7.5 cluster with Sentry 
enabled (in case this matters regarding the privilege warning). Spark was built 
against the CDH libraries.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-20808) External Table unnecessarily not create in Hive-compatible way

Reply via email to