[
https://issues.apache.org/jira/browse/SPARK-35762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-35762.
----------------------------------
Resolution: Duplicate
It's a duplicate of SPARK-35700
> Errors while using spark-sql read hive 3.1 orc table
> ----------------------------------------------------
>
> Key: SPARK-35762
> URL: https://issues.apache.org/jira/browse/SPARK-35762
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.1, 3.1.2
> Environment: h1. **** env ****
> OS: Centos7.7 x86_64
> JDK: Oracle JDK 1.8.0_191
> Hadoop: 2.8.4
> Hive: 3.1.2
> Spark: 3.1.0
>
> A simple 3 nodes hadoop cluster.
> Hive with metastore service started.
> Reporter: laokong
> Priority: Major
> Attachments: full-stack-trace.log
>
>
> === recreate problem steps
> 1.create orc table in hive
> ```
> hive> drop table demo;
> OK
> Time taken: 0.514 seconds
> hive> create table demo(`id` varchar(20)) stored as orc;
> OK
> Time taken: 0.134 seconds
> hive> insert into table demo(id) values('111');
> hive> select * from demo;
> OK
> 111
> Time taken: 0.291 seconds, Fetched: 1 row(s)
> ```
>
> 2.copy hive-site.xml to spark/conf
> ```
> cd /root/spark-3.1.2-bin-hadoop2.7/conf
> ln -s /usr/hive/conf/hive-site.xml .
> ```
> 3.exec SQL in spark-sql
> ```
> [root@master spark-3.1.2-bin-hadoop2.7]# ./bin/spark-sql[root@master
> spark-3.1.2-bin-hadoop2.7]# ./bin/spark-sql21/06/14 23:32:16 WARN
> util.NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicableSetting default log
> level to "WARN".To adjust logging level use sc.setLogLevel(newLevel). For
> SparkR, use setLogLevel(newLevel).21/06/14 23:32:17 WARN conf.HiveConf:
> HiveConf of name hive.metastore.db.type does not exist
> Spark master: local[*], Application Id: local-1623727939853spark-sql> select
> * from demo where id='222';
> java.lang.UnsupportedOperationException: DataType:
> varchar(20)java.lang.UnsupportedOperationException: DataType: varchar(20) at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135)
> at
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:392) at
> scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at
> scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at
> scala.collection.immutable.List.flatMap(List.scala:355) at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
> at scala.Option.foreach(Option.scala:407) at
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
> at
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
> Source) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source) at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
> at
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
> at
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at
> org.apache.spark.scheduler.Task.run(Task.scala:131) at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> ```
>
> ==== summary ====
> 1.Tested in spark-3.1.1/spark-3.1.2 report this error.
> If start spark-sql with option `-c spark.sql.orc.impl=hive` ,problem disappear
>
> 2.Tested in spark-3.0.2 without any problem.
>
> 3.spark 3.1.2 with option `-c spark.sql.orc.impl=hive` performance is slower
> than spark3.0.2 without this option.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]