wombatu-kun commented on code in PR #18341:
URL: https://github.com/apache/hudi/pull/18341#discussion_r2998813472
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/lance/HoodieBaseLanceWriter.java:
##########
@@ -214,6 +216,15 @@ public void close() throws IOException {
}
}
+ /**
+ * Returns the total number of bytes accumulated across all flushed Arrow
batches.
+ * Computed as the sum of each field vector's buffer size at flush time,
providing
+ * an uncompressed estimate analogous to {@code ParquetWriter.getDataSize()}.
+ */
+ protected long getDataSize() {
+ return totalFlushedDataSize;
Review Comment:
fixed: getDataSize returns the estimated data size in bytes, including both
flushed batches and the current in-progress batch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]