syun64 commented on issue #6978:
URL: https://github.com/apache/iceberg/issues/6978#issuecomment-1450880716
This issue only happens if the schema of the table is updated (not the
metadata table, but its corresponding iceberg table).
In the following example, I'm adding a new column to the existing table to
invoke a schema update - i.e. there are now a total of 2 schemas (for the
iceberg table) in the snapshot metadata.json file:
(SimpleExtraColumnRecord is a SimpleRecord with just one extra string column)
```
@Test
public void testFilesVersionAsOf() throws Exception {
// Create table and insert data
sql(
"CREATE TABLE %s (id bigint, data string) "
+ "USING iceberg "
+ "PARTITIONED BY (data) "
+ "TBLPROPERTIES"
+ "('format-version'='2',
'write.delete.mode'='merge-on-read')",
tableName);
List<SimpleRecord> recordsA =
Lists.newArrayList(new SimpleRecord(1, "a"), new SimpleRecord(2,
"a"));
spark
.createDataset(recordsA, Encoders.bean(SimpleRecord.class))
.coalesce(1)
.writeTo(tableName)
.append();
Table table = Spark3Util.loadIcebergTable(spark, tableName);
Long olderSnapshotId = table.currentSnapshot().snapshotId();
sql(
"ALTER TABLE %s ADD COLUMNS (data2 string)",
tableName);
List<SimpleExtraColumnRecord> recordsB =
Lists.newArrayList(new SimpleExtraColumnRecord(1, "b", "c"), new
SimpleExtraColumnRecord (2, "b", "c"));
spark
.createDataset(recordsB,
Encoders.bean(SimpleExtraColumnRecord.class))
.coalesce(1)
.writeTo(tableName)
.append();
List<Object[]> res1 = sql("SELECT * from %s.files VERSION AS OF %s",
tableName, olderSnapshotId );
Dataset<Row> ds = spark.read().format("iceberg").option("snapshot-id",
olderSnapshotId ).load(tableName + ".files");
List<Row> res2 = ds.collectAsList();
Long currentSnapshotId = table.currentSnapshot().snapshotId();
List<Object[]> res3 = sql("SELECT * from %s.files VERSION AS OF %s",
tableName, currentSnapshotId);
Dataset<Row> ds2 = spark.read().format("iceberg").option("snapshot-id",
currentSnapshotId ).load(tableName + ".files");
List<Row> res4 = ds2.collectAsList();
}
```
This test fails on this call:
`List<Object[]> res3 = sql("SELECT * from %s.files VERSION AS OF %s",
tableName, currentSnapshotId);`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]