[GitHub] [iceberg] 1ambda edited a comment on issue #3295: Can spark operates iceberg table created by hive?

GitBox Tue, 07 Dec 2021 16:49:02 -0800


1ambda edited a comment on issue #3295:
URL: https://github.com/apache/iceberg/issues/3295#issuecomment-988384976



   Tested in Spark 3.1.2 (Iceberg runtime 0.12.1), But still got same error. 
   
   In Presto, usually we don't need to setup much, prestodb 0.266 embed 
required iceberg runtime (0.12.1) by default and user only need to provide 
catalog setup for iceberg. Since catalog setup for iceberg is quite simple, I 
don't think Presto is the cause of this problem. (even though the table is 
created in presto). The Spark iceberg runtime doesn't seem to be able to read 
this file properly.
   
   
   ```
   Spark session available as 'spark'.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
         /_/
   
   Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> spark.sql("SELECT * FROM hive_prod.test_db.test_table ...").show()
   res0: org.apache.spark.sql.DataFrame = [placeno: col_a, col_b: string ... 1 
more field]
   
   scala> spark.sql("SELECT * FROM hive_prod.test_db.test_table ...").show()
   21/12/08 09:42:22 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 
0)/ 1]
   java.lang.ClassCastException: [B cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
           at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
           at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
           at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
           at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:131)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   21/12/08 09:42:22 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) 
(hive-metastore-2-66d6c89cf-jxbcg executor driver): 
java.lang.ClassCastException: [B cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] 1ambda edited a comment on issue #3295: Can spark operates iceberg table created by hive?

Reply via email to