syun64 commented on issue #6978:
URL: https://github.com/apache/iceberg/issues/6978#issuecomment-1450880716

   This issue only happens if the schema of the table is updated (not the 
metadata table, but its corresponding iceberg table).
   In the following example, I'm adding a new column to the existing table to 
invoke a schema update - i.e. there are now a total of 2 schemas (for the 
iceberg table) in the snapshot metadata.json file:
   
   (SimpleExtraColumnRecord is a SimpleRecord with just one extra string column)
   ```
     @Test
     public void testFilesVersionAsOf() throws Exception {
       // Create table and insert data
       sql(
               "CREATE TABLE %s (id bigint, data string) "
                       + "USING iceberg "
                       + "PARTITIONED BY (data) "
                       + "TBLPROPERTIES"
                       + "('format-version'='2', 
'write.delete.mode'='merge-on-read')",
               tableName);
   
       List<SimpleRecord> recordsA =
               Lists.newArrayList(new SimpleRecord(1, "a"), new SimpleRecord(2, 
"a"));
       spark
               .createDataset(recordsA, Encoders.bean(SimpleRecord.class))
               .coalesce(1)
               .writeTo(tableName)
               .append();
   
       Table table = Spark3Util.loadIcebergTable(spark, tableName);
       Long olderSnapshotId = table.currentSnapshot().snapshotId();
   
       sql(
               "ALTER TABLE %s ADD COLUMNS (data2 string)",
               tableName);
   
       List<SimpleExtraColumnRecord> recordsB =
               Lists.newArrayList(new SimpleExtraColumnRecord(1, "b", "c"), new 
SimpleExtraColumnRecord (2, "b", "c"));
       spark
               .createDataset(recordsB, 
Encoders.bean(SimpleExtraColumnRecord.class))
               .coalesce(1)
               .writeTo(tableName)
               .append();
   
   
       List<Object[]> res1 = sql("SELECT * from %s.files VERSION AS OF %s", 
tableName, olderSnapshotId );
   
       Dataset<Row> ds = spark.read().format("iceberg").option("snapshot-id", 
olderSnapshotId ).load(tableName + ".files");
       List<Row> res2 = ds.collectAsList();
   
       Long currentSnapshotId = table.currentSnapshot().snapshotId();
   
       List<Object[]> res3 = sql("SELECT * from %s.files VERSION AS OF %s", 
tableName, currentSnapshotId);
   
       Dataset<Row> ds2 = spark.read().format("iceberg").option("snapshot-id", 
currentSnapshotId ).load(tableName + ".files");
       List<Row> res4 = ds2.collectAsList();
     }
   ```
   
   This test fails on this call:
   `List<Object[]> res3 = sql("SELECT * from %s.files VERSION AS OF %s", 
tableName, currentSnapshotId);`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to