syun64 opened a new issue, #6978:
URL: https://github.com/apache/iceberg/issues/6978

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Time travel / reading as of certain snapshot ID fails on Metadata Tables if 
there was ever a schema evolution introduced in the iceberg table. This seems 
like it could be an unwanted side effect of this PR that allows us to user the 
snapshot schema when reading a snapshot: #3722
   
   Since schema evolution is not supported on metadata tables, we could patch 
this bug by using a condition that checks if the iceberg table is an instance 
of 
[BaseMetadataTable](https://github.com/wypoon/iceberg/blob/03d80eb735f89c8318a7d83ec3baa1b3119642de/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java#L163)
 before making the 
[snapshotSchema](https://github.com/apache/iceberg/blob/master/spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L133)
 call
   
   
   Example query:
   
   `spark.read.format("iceberg").option("snapshot-id", 
10963874102873L).load("db.table.files")`
   
   Example Error after Schema evolution:
   
   ```
   Py4JJavaError: An error occurred while calling o373.load.
   : java.lang.IllegalStateException: Cannot find schema with schema id 1
        at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:590)
        at org.apache.iceberg.util.SnapshotUtil.schemaFor(SnapshotUtil.java:363)
        at org.apache.iceberg.util.SnapshotUtil.schemaFor(SnapshotUtil.java:388)
        at 
org.apache.iceberg.spark.source.SparkTable.snapshotSchema(SparkTable.java:127)
        at 
org.apache.iceberg.spark.source.SparkTable.schema(SparkTable.java:133)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:176)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:303)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:265)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
        at jdk.internal.reflect.GeneratedMethodAccessor210.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to