hudi-agent commented on code in PR #19002:
URL: https://github.com/apache/hudi/pull/19002#discussion_r3409078359
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -564,14 +564,15 @@ public List<WriteStatus> close() {
writer = null;
}
- // update final size, once for all log files
- // TODO we can actually deduce file size purely from AppendResult (based
on offset and size
- // of the appended block)
+ // Set the final on-disk size of each log file. Appends within an append
handle are contiguous,
+ // so a log file's length equals its start offset plus the total bytes
appended to it. That is
+ // exactly what fs.getFileStatus().getLength() returns, and both values
are already captured by
+ // the AppendResult stats (logOffset and the accumulated
fileSizeInBytes). Deriving the size this
+ // way avoids a getPathInfo/HEAD per log file, which is a remote round
trip per file group on
+ // object stores.
for (WriteStatus status : statuses) {
- long logFileSize = storage.getPathInfo(
- new StoragePath(config.getBasePath(), status.getStat().getPath()))
- .getLength();
- status.getStat().setFileSizeInBytes(logFileSize);
+ HoodieDeltaWriteStat stat = (HoodieDeltaWriteStat) status.getStat();
+ stat.setFileSizeInBytes(stat.getLogOffset() +
stat.getFileSizeInBytes());
Review Comment:
🤖 nit: the `fileSizeInBytes` field plays two different roles across this one
statement — "accumulated append bytes" on the right-hand side, "total on-disk
size" on the left — which makes it easy to misread at a glance. A named local
variable would make the before-state explicit: `long appendedBytes =
stat.getFileSizeInBytes(); stat.setFileSizeInBytes(stat.getLogOffset() +
appendedBytes);`
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]