RussellSpitzer opened a new issue #1735:
URL: https://github.com/apache/iceberg/issues/1735
A table created with Iceberg 0.8~ has issues when the metadata table is read
using column pruning in Spark with Iceberg 0.9.1 or 0.9.0.
```scala
scala> val df =
spark.read.format("iceberg").load("file:///Users/russellspitzer/Temp/OldIcebergTable#all_entries").show
+------+-------------------+---------------+--------------------+
|status| snapshot_id|sequence_number| data_file|
+------+-------------------+---------------+--------------------+
| 1|3638703680170519943| 0|[0, file:/var/fol...|
| 1|3402083355630191798| 0|[0, file:/var/fol...|
| 0|3402083355630191798| 0|[0, file:/var/fol...|
| 2|8080930776000500924| 0|[0, file:/var/fol...|
+------+-------------------+---------------+--------------------+
scala> val df =
spark.read.format("iceberg").load("file:///Users/russellspitzer/Temp/OldIcebergTable#all_entries").select("status").show
20/11/06 10:55:27 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.IllegalArgumentException: Missing required field: data_file
```
This does not occur if you make the table within the same Iceberg Version
(0.9.0 or 0.9.1). A different error occurs if you attempt to prune within the
data_file construct itself.
```
val df =
spark.read.format("iceberg").load("file:///Users/russellspitzer/Temp/OldIcebergTable#all_entries").select("data_file.file_path").show
20/11/06 11:45:42 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
0)/ 1]
java.lang.ClassCastException: Cannot cast java.lang.String to
java.lang.Integer
```
Selecting just the data_file is fine
```
scala> val df =
spark.read.format("iceberg").load("file:///Users/russellspitzer/Temp/OldIcebergTable#all_entries").select("data_file").show
+--------------------+
| data_file|
+--------------------+
|[0, file:/var/fol...|
|[0, file:/var/fol...|
|[0, file:/var/fol...|
|[0, file:/var/fol...|
+--------------------+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]