ggangadharan commented on PR #5590:
URL: https://github.com/apache/hive/pull/5590#issuecomment-2567772194

   @okumin  Thanks for the update
   
   Successfully read the migrated ICEBERG table **(previously migrated from 
Hive)** using **spark.sql** in **Spark** , and it worked as expected.
   
   Attaching **spark3-shell** output for a reference. 
   
   ```
   scala> spark.sql("DESCRIBE  TABLE formatted 
default.hive_28518_test").show(false)
   
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
   |col_name                    |data_type                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |comment|
   
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
   |id                          |int                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |null   |
   |name                        |string                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |null   |
   |dt                          |timestamp_ntz                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |null   |
   |                            |                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |# Metadata Columns          |                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |_spec_id                    |int                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |_partition                  |struct<>                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |_file                       |string                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |_pos                        |bigint                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |_deleted                    |boolean                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |                            |                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |# Detailed Table Information|                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |Name                        |spark_catalog.default.hive_28518_test          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |Type                        |MANAGED                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |Location                    
|hdfs://ns1/warehouse/tablespace/external/hive/hive_28518_test                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
      |       |
   |Provider                    |iceberg                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |       |
   |Table Properties            
|[EXTERNAL=TRUE,MIGRATED_TO_ICEBERG=true,OBJCAPABILITIES=EXTREAD,EXTWRITE,current-snapshot-id=4868016704679265240,engine.hive.enabled=true,format=iceberg/parquet,format-version=2,iceberg.orc.files.only=false,last_modified_by=hive,last_modified_time=1735822103,schema.name-mapping.default=[
 {\n  "field-id" : 1,\n  "names" : [ "id" ]\n}, {\n  "field-id" : 2,\n  "names" 
: [ "name" ]\n}, {\n  "field-id" : 3,\n  "names" : [ "dt" ]\n} 
],storage_handler=org.apache.iceberg.mr.hive.HiveIcebergStorageHandler,table_type=ICEBERG,write.delete.mode=merge-on-read,write.format.default=parquet,write.merge.mode=merge-on-read,write.update.mode=merge-on-read]|
       |
   
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
   
   
   scala> spark.sql("select dt from default.hive_28518_test").show(10,false)
   +--------------------------+
   |dt                        |
   +--------------------------+
   |2024-08-09 14:08:26.326107|
   +--------------------------+
   ```
   
   FYI 
   
   While reading the string column name, I encountered an error that has been 
reported [here](https://github.com/apache/iceberg/issues/11367) . Since it is 
related to a spark/Iceberg issue, we can ignore it for now
   
   ```
   scala> spark.sql("select name from default.hive_28518_test").show()
   25/01/02 12:59:29 WARN  scheduler.TaskSetManager: [task-result-getter-3]: 
Lost task 0.0 in stage 5.0 (TID 11) (ccycloud-2.nightly7310-ec.root.comops.site 
executor 2): java.lang.UnsupportedOperationException: Unsupported type: 
UTF8String
        at 
org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor.getUTF8String(ArrowVectorAccessor.java:81)
        at 
org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getUTF8String(IcebergArrowColumnVector.java:138)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:574)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1530)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   25/01/02 12:59:29 ERROR scheduler.TaskSetManager: [task-result-getter-2]: 
Task 0 in stage 5.0 failed 4 times; aborting job
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 
14) (ccycloud-2.nightly7310-ec.root.comops.site executor 2): 
java.lang.UnsupportedOperationException: Unsupported type: UTF8String
        at 
org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor.getUTF8String(ArrowVectorAccessor.java:81)
        at 
org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getUTF8String(IcebergArrowColumnVector.java:138)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:574)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1530)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   Driver stacktrace:
     at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
     at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
     at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
     at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
     at scala.Option.foreach(Option.scala:407)
     at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2300)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2319)
     at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530)
     at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483)
     at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
     at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4183)
     at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3167)
     at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4173)
     at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:527)
     at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4171)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4171)
     at org.apache.spark.sql.Dataset.head(Dataset.scala:3167)
     at org.apache.spark.sql.Dataset.take(Dataset.scala:3388)
     at org.apache.spark.sql.Dataset.getRows(Dataset.scala:290)
     at org.apache.spark.sql.Dataset.showString(Dataset.scala:329)
     at org.apache.spark.sql.Dataset.show(Dataset.scala:815)
     at org.apache.spark.sql.Dataset.show(Dataset.scala:774)
     at org.apache.spark.sql.Dataset.show(Dataset.scala:783)
     ... 47 elided
   Caused by: java.lang.UnsupportedOperationException: Unsupported type: 
UTF8String
     at 
org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor.getUTF8String(ArrowVectorAccessor.java:81)
     at 
org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getUTF8String(IcebergArrowColumnVector.java:138)
     at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
     at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
     at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
     at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
     at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
     at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
     at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
     at org.apache.spark.scheduler.Task.run(Task.scala:139)
     at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:574)
     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1530)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:748)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to