wombatu-kun commented on code in PR #19002:
URL: https://github.com/apache/hudi/pull/19002#discussion_r3409463412


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/table/log/TestLogReaderUtils.java:
##########
@@ -131,6 +132,48 @@ public void testGetAllLogFilesWithMaxCommit() throws 
Exception {
     }
   }
 
+  @Test
+  public void testLogFileWriteStatSizeMatchesOnDisk() throws Exception {
+    // HoodieAppendHandle records each log file's size by deriving it from 
AppendResult
+    // (logOffset + accumulated appended bytes) instead of a getPathInfo per 
file. Validate that the
+    // derived size in the write stat matches the actual on-disk log file 
length, across multiple
+    // appends to the same log file (two upsert commits with no compaction / 
small-file packing).
+    HoodieTableMetaClient metaClient = 
getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, new Properties());
+
+    HoodieWriteConfig config = getConfigBuilder(true)
+        .withPath(basePath())
+        .withCompactionConfig(HoodieCompactionConfig.newBuilder()
+            .withInlineCompaction(false)
+            .compactionSmallFileSize(0)
+            .build())
+        .build();
+
+    HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator();
+
+    try (SparkRDDWriteClient client = getHoodieWriteClient(config)) {
+      // First commit - insert data (base files)
+      String firstCommit = "001";
+      WriteClientTestUtils.startCommitWithTime(client, firstCommit);
+      JavaRDD<WriteStatus> insertRdd = 
client.insert(jsc().parallelize(dataGen.generateInserts(firstCommit, 100), 1), 
firstCommit);
+      assertNoWriteErrors(insertRdd.collect());
+      client.commit(firstCommit, insertRdd);
+
+      // Upsert across two commits so each log file accumulates multiple 
appends through the handle

Review Comment:
   This comment (and the method docstring above) says the two upsert commits 
make each log file "accumulate multiple appends through the handle," but under 
the default write version (NINE) each delta commit writes a new instant-named 
log file from offset 0 (HoodieWriteHandle.createLogWriter, the version >= EIGHT 
branch builds the writer with withFileSize(0L)). So each log file gets a single 
append, getLogOffset() is 0 for every status here, and this assertion passes 
identically with or without the `logOffset +` term - the term only matters when 
appending to a pre-existing log file (the table version SIX branch). To 
actually exercise the derived sum, drive a logOffset > 0 case (e.g. build the 
config with write table version SIX so the second commit appends to the 
existing file); otherwise reword the comments to say only the offset-0 path is 
covered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to