hudi-agent commented on code in PR #19002:
URL: https://github.com/apache/hudi/pull/19002#discussion_r3409078359


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -564,14 +564,15 @@ public List<WriteStatus> close() {
         writer = null;
       }
 
-      // update final size, once for all log files
-      // TODO we can actually deduce file size purely from AppendResult (based 
on offset and size
-      //      of the appended block)
+      // Set the final on-disk size of each log file. Appends within an append 
handle are contiguous,
+      // so a log file's length equals its start offset plus the total bytes 
appended to it. That is
+      // exactly what fs.getFileStatus().getLength() returns, and both values 
are already captured by
+      // the AppendResult stats (logOffset and the accumulated 
fileSizeInBytes). Deriving the size this
+      // way avoids a getPathInfo/HEAD per log file, which is a remote round 
trip per file group on
+      // object stores.
       for (WriteStatus status : statuses) {
-        long logFileSize = storage.getPathInfo(
-            new StoragePath(config.getBasePath(), status.getStat().getPath()))
-            .getLength();
-        status.getStat().setFileSizeInBytes(logFileSize);
+        HoodieDeltaWriteStat stat = (HoodieDeltaWriteStat) status.getStat();
+        stat.setFileSizeInBytes(stat.getLogOffset() + 
stat.getFileSizeInBytes());

Review Comment:
   🤖 nit: the `fileSizeInBytes` field plays two different roles across this one 
statement — "accumulated append bytes" on the right-hand side, "total on-disk 
size" on the left — which makes it easy to misread at a glance. A named local 
variable would make the before-state explicit: `long appendedBytes = 
stat.getFileSizeInBytes(); stat.setFileSizeInBytes(stat.getLogOffset() + 
appendedBytes);`
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to