[I] [Bug] spark-hive-connector read faild with count(*) [kyuubi]

via GitHub Fri, 13 Oct 2023 01:41:59 -0700


lordk911 opened a new issue, #5414:
URL: https://github.com/apache/kyuubi/issues/5414


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   spark: 3.3.3
   remote hive:    hdp3.1.0
   local hive:    hdp3.1.0
   spark-hive-connector: master
   complie commond: mvn clean install -pl 
extensions/spark/kyuubi-spark-connector-hive -DskipTests -Dspark.version=3.3.3
   
   from spark-shell I can run : spark.sql("select * from 
hive_catalog.crm_bi.XXX").show , but when I run spark.sql("select count(*) from 
hive_catalog.crm_bi.XXX").show I will got error: 
   
   `java.lang.NumberFormatException: For input string: ""
           at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
           at java.lang.Integer.parseInt(Integer.java:592)
           at java.lang.Integer.parseInt(Integer.java:615)
           at 
org.apache.hadoop.hive.serde2.ColumnProjectionUtils.getReadColumnIDs(ColumnProjectionUtils.java:186)
           at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.genIncludedColumns(OrcInputFormat.java:399)
           at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:322)
           at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:232)
           at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1848)
           at 
org.apache.kyuubi.spark.connector.hive.read.HivePartitionReaderFactory$$anon$2.liftedTree1$1(HivePartitionReaderFactory.scala:130)
           at 
org.apache.kyuubi.spark.connector.hive.read.HivePartitionReaderFactory$$anon$2.<init>(HivePartitionReaderFactory.scala:129)
           at 
org.apache.kyuubi.spark.connector.hive.read.HivePartitionReaderFactory.createPartitionWritableReader(HivePartitionReaderFactory.scala:122)
           at 
org.apache.kyuubi.spark.connector.hive.read.HivePartitionReaderFactory.buildReaderInternal(HivePartitionReaderFactory.scala:91)
           at 
org.apache.kyuubi.spark.connector.hive.read.HivePartitionReaderFactory.$anonfun$createReader$1(HivePartitionReaderFactory.scala:75)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at 
org.apache.kyuubi.spark.connector.hive.read.SparkFilePartitionReader.getNextReader(SparkFilePartitionReader.scala:99)
           at 
org.apache.kyuubi.spark.connector.hive.read.SparkFilePartitionReader.next(SparkFilePartitionReader.scala:46)
           at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119)
           at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:156)
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
           at scala.Option.exists(Option.scala:376)
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97)
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
           at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
           at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)`
   
   after I got this error I can't run spark.sql("select * from 
hive_catalog.crm_bi.XXX").show again , will got the same error
   
   ### Affects Version(s)
   
   master
   
   ### Kyuubi Server Log Output
   
   _No response_
   
   ### Kyuubi Engine Log Output
   
   _No response_
   
   ### Kyuubi Server Configurations
   
   _No response_
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   spark.sql.catalog.hive_catalog     
org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
   spark.sql.catalog.hive_catalog.hive.metastore.uris     
thrift://hdpdev246:9083,thrift://hdpdev248:9083
   
   I also tried to config metastore other information: 
   spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.version     3.1.0
   spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars     path
   spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars.path     
file:///data/soft/standalone-metastore/*.jar
   
   this not help
   ```
   
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] spark-hive-connector read faild with count(*) [kyuubi]

Reply via email to