[GitHub] [iceberg] felixYyu opened a new issue #4114: Spark: select * from table.partitions Exception

GitBox Mon, 14 Feb 2022 01:04:43 -0800


felixYyu opened a new issue #4114:
URL: https://github.com/apache/iceberg/issues/4114



   spark 3.2.1
   iceberg 0.13.0
   
   1.create table partitioned by hours(ts), after insert overwrite data and 
drop partition field hours(ts), then `select * from table.partitions` with 
spark sql, but exception occurred
   ```
   Caused by: java.lang.IllegalStateException: Unknown type for long field. 
Type name: java.lang.Integer
        at 
org.apache.iceberg.spark.source.StructInternalRow.getLong(StructInternalRow.java:146)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:127)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   2.create table partitioned by bucket(5, id), after insert overwrite data and 
drop partition field bucket(5, id), then `select * from table.partitions` with 
spark sql, but exception occurred
   ```
   Wrong class, java.lang.Long, for object: 0
   Exception in thread "main" java.lang.IllegalArgumentException: Wrong class, 
java.lang.Long, for object: 0
        at org.apache.iceberg.PartitionData.get(PartitionData.java:120)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:126)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:102)
        at 
org.apache.iceberg.util.StructLikeWrapper.equals(StructLikeWrapper.java:76)
        at java.util.HashMap.getNode(HashMap.java:572)
        at java.util.HashMap.get(HashMap.java:557)
        at 
org.apache.iceberg.PartitionsTable$PartitionMap.get(PartitionsTable.java:153)
        at 
org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:101)
        at org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:75)
        at 
org.apache.iceberg.PartitionsTable.access$300(PartitionsTable.java:35)
        at 
org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:138)
        at org.apache.iceberg.StaticTableScan.planFiles(StaticTableScan.java:66)
        at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:193)
        at 
org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:114)
        at 
org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:128)
        at 
org.apache.iceberg.spark.source.SparkBatchScan.planInputPartitions(SparkBatchScan.java:141)
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions$lzycompute(BatchScanExec.scala:52)
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions(BatchScanExec.scala:52)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:93)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:92)
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:35)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:123)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] felixYyu opened a new issue #4114: Spark: select * from table.partitions Exception

Reply via email to