amogh-jahagirdar commented on code in PR #16699:
URL: https://github.com/apache/iceberg/pull/16699#discussion_r3375162940


##########
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteManifestsAction.java:
##########
@@ -196,6 +196,144 @@ public void testRewriteManifestsPreservesOptionalFields() 
throws IOException {
     }
   }
 
+  @TestTemplate
+  public void testRewriteV3ManifestsPreservesFirstRowId() {
+    assumeThat(formatVersion).isGreaterThanOrEqualTo(3);
+
+    PartitionSpec spec = PartitionSpec.unpartitioned();
+    Map<String, String> options = Maps.newHashMap();
+    options.put(TableProperties.FORMAT_VERSION, String.valueOf(formatVersion));
+    Table table = TABLES.create(SCHEMA, spec, options, tableLocation);
+
+    writeRecords(Lists.newArrayList(new ThreeColumnRecord(1, null, "AAAA")));
+    writeRecords(Lists.newArrayList(new ThreeColumnRecord(2, "CCCC", "CCCC")));
+    table.refresh();
+
+    assertThat(table.currentSnapshot().dataManifests(table.io())).hasSize(2);
+
+    List<Row> rowsBefore =
+        spark
+            .read()
+            .format("iceberg")
+            .load(tableLocation)
+            .selectExpr("_row_id", "_last_updated_sequence_number", "*")
+            .orderBy("_row_id")
+            .collectAsList();
+    assertThat(rowsBefore).extracting(r -> 
r.<Long>getAs("_row_id")).doesNotContainNull();

Review Comment:
   I'm not sure about checking exact row IDs in this context. The 
implementation can always change in ways to start assigning row IDs from a 
different offset in a way that's still spec compliant, and we ideally don't 
want to have to change these tests for those cases. The update/merge DML tests 
setup files with explicit row IDs and assert those just because we're actually 
trying to verify carry over behavior in that case, but here we're trying to 
make sure the row IDs for the same records before/after are consistent after a 
rewrite manifests operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to