[GitHub] [iceberg] syun64 commented on issue #6978: Reading as of Snapshot ID fails on Metadata Tables after Schema Evolution

via GitHub Wed, 01 Mar 2023 13:37:24 -0800


syun64 commented on issue #6978:
URL: https://github.com/apache/iceberg/issues/6978#issuecomment-1450880716


   This issue only happens if the schema of the table is updated (not the 
metadata table, but its corresponding iceberg table).
   In the following example, I'm adding a new column to the existing table to 
invoke a schema update - i.e. there are now a total of 2 schemas (for the 
iceberg table) in the snapshot metadata.json file:
   
   (SimpleExtraColumnRecord is a SimpleRecord with just one extra string column)
   ```
     @Test
     public void testFilesVersionAsOf() throws Exception {
       // Create table and insert data
       sql(
               "CREATE TABLE %s (id bigint, data string) "
                       + "USING iceberg "
                       + "PARTITIONED BY (data) "
                       + "TBLPROPERTIES"
                       + "('format-version'='2', 
'write.delete.mode'='merge-on-read')",
               tableName);
   
       List<SimpleRecord> recordsA =
               Lists.newArrayList(new SimpleRecord(1, "a"), new SimpleRecord(2, 
"a"));
       spark
               .createDataset(recordsA, Encoders.bean(SimpleRecord.class))
               .coalesce(1)
               .writeTo(tableName)
               .append();
   
       Table table = Spark3Util.loadIcebergTable(spark, tableName);
       Long olderSnapshotId = table.currentSnapshot().snapshotId();
   
       sql(
               "ALTER TABLE %s ADD COLUMNS (data2 string)",
               tableName);
   
       List<SimpleExtraColumnRecord> recordsB =
               Lists.newArrayList(new SimpleExtraColumnRecord(1, "b", "c"), new 
SimpleExtraColumnRecord (2, "b", "c"));
       spark
               .createDataset(recordsB, 
Encoders.bean(SimpleExtraColumnRecord.class))
               .coalesce(1)
               .writeTo(tableName)
               .append();
   
   
       List<Object[]> res1 = sql("SELECT * from %s.files VERSION AS OF %s", 
tableName, olderSnapshotId );
   
       Dataset<Row> ds = spark.read().format("iceberg").option("snapshot-id", 
olderSnapshotId ).load(tableName + ".files");
       List<Row> res2 = ds.collectAsList();
   
       Long currentSnapshotId = table.currentSnapshot().snapshotId();
   
       List<Object[]> res3 = sql("SELECT * from %s.files VERSION AS OF %s", 
tableName, currentSnapshotId);
   
       Dataset<Row> ds2 = spark.read().format("iceberg").option("snapshot-id", 
currentSnapshotId ).load(tableName + ".files");
       List<Row> res4 = ds2.collectAsList();
     }
   ```
   
   This test fails on this call:
   `List<Object[]> res3 = sql("SELECT * from %s.files VERSION AS OF %s", 
tableName, currentSnapshotId);`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] syun64 commented on issue #6978: Reading as of Snapshot ID fails on Metadata Tables after Schema Evolution

Reply via email to