voonhous commented on code in PR #19002:
URL: https://github.com/apache/hudi/pull/19002#discussion_r3415264847


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/table/log/TestLogReaderUtils.java:
##########
@@ -131,6 +132,48 @@ public void testGetAllLogFilesWithMaxCommit() throws 
Exception {
     }
   }
 
+  @Test
+  public void testLogFileWriteStatSizeMatchesOnDisk() throws Exception {
+    // HoodieAppendHandle records each log file's size by deriving it from 
AppendResult
+    // (logOffset + accumulated appended bytes) instead of a getPathInfo per 
file. Validate that the
+    // derived size in the write stat matches the actual on-disk log file 
length, across multiple
+    // appends to the same log file (two upsert commits with no compaction / 
small-file packing).
+    HoodieTableMetaClient metaClient = 
getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, new Properties());
+
+    HoodieWriteConfig config = getConfigBuilder(true)
+        .withPath(basePath())
+        .withCompactionConfig(HoodieCompactionConfig.newBuilder()
+            .withInlineCompaction(false)
+            .compactionSmallFileSize(0)
+            .build())
+        .build();
+
+    HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator();
+
+    try (SparkRDDWriteClient client = getHoodieWriteClient(config)) {
+      // First commit - insert data (base files)
+      String firstCommit = "001";
+      WriteClientTestUtils.startCommitWithTime(client, firstCommit);
+      JavaRDD<WriteStatus> insertRdd = 
client.insert(jsc().parallelize(dataGen.generateInserts(firstCommit, 100), 1), 
firstCommit);
+      assertNoWriteErrors(insertRdd.collect());
+      client.commit(firstCommit, insertRdd);
+
+      // Upsert across two commits so each log file accumulates multiple 
appends through the handle

Review Comment:
   Good catch. Parameterized the test over write table versions 6 and 9: under 
v6 the second upsert appends to the existing log file (logOffset > 0), so the 
derived sum is now actually exercised; v9 covers the offset-0 path. Reworded 
the comments to call out the per-version behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to