[
https://issues.apache.org/jira/browse/SPARK-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901914#comment-15901914
]
Dongjoon Hyun commented on SPARK-18355:
---------------------------------------
I confirmed that this happens only with
`spark.sql.hive.convertMetastoreOrc=true`.
{code}
scala> spark.version
res0: String = 2.2.0-SNAPSHOT
scala> sql("select click_id, search_id from testorc").show
+--------+---------+
|click_id|search_id|
+--------+---------+
| 12| 2|
+--------+---------+
scala> sql("set spark.sql.hive.convertMetastoreOrc=true").show(false)
+----------------------------------+-----+
|key |value|
+----------------------------------+-----+
|spark.sql.hive.convertMetastoreOrc|true |
+----------------------------------+-----+
scala> sql("select click_id, search_id from testorc").show
17/03/08 12:15:43 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.IllegalArgumentException: Field "click_id" does not exist.
{code}
> Spark SQL fails to read data from a ORC hive table that has a new column
> added to it
> ------------------------------------------------------------------------------------
>
> Key: SPARK-18355
> URL: https://issues.apache.org/jira/browse/SPARK-18355
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2
> Environment: Centos6
> Reporter: Sandeep Nemuri
>
> *PROBLEM*:
> Spark SQL fails to read data from a ORC hive table that has a new column
> added to it.
> Below is the exception:
> {code}
> scala> sqlContext.sql("select click_id,search_id from testorc").show
> 16/11/03 22:17:53 INFO ParseDriver: Parsing command: select
> click_id,search_id from testorc
> 16/11/03 22:17:54 INFO ParseDriver: Parse Completed
> java.lang.AssertionError: assertion failed
> at scala.Predef$.assert(Predef.scala:165)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToOrcRelation(HiveMetastoreCatalog.scala:588)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
> at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> {code}
> *STEPS TO SIMULATE THIS ISSUE*:
> 1) Create table in hive.
> {code}
> CREATE TABLE `testorc`(
> `click_id` string,
> `search_id` string,
> `uid` bigint)
> PARTITIONED BY (
> `ts` string,
> `hour` string)
> STORED AS ORC;
> {code}
> 2) Load data into table :
> {code}
> INSERT INTO TABLE testorc PARTITION (ts = '98765',hour = '01' ) VALUES
> (12,2,12345);
> {code}
> 3) Select through spark shell (This works)
> {code}
> sqlContext.sql("select click_id,search_id from testorc").show
> {code}
> 4) Now add column to hive table
> {code}
> ALTER TABLE testorc ADD COLUMNS (dummy string);
> {code}
> 5) Now again select from spark shell
> {code}
> scala> sqlContext.sql("select click_id,search_id from testorc").show
> 16/11/03 22:17:53 INFO ParseDriver: Parsing command: select
> click_id,search_id from testorc
> 16/11/03 22:17:54 INFO ParseDriver: Parse Completed
> java.lang.AssertionError: assertion failed
> at scala.Predef$.assert(Predef.scala:165)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToOrcRelation(HiveMetastoreCatalog.scala:588)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
> at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]