hudi-agent commented on code in PR #18860:
URL: https://github.com/apache/hudi/pull/18860#discussion_r3313492630
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/IOUtils.java:
##########
@@ -98,6 +98,45 @@ public static long
getMaxMemoryPerCompaction(TaskContextSupplier context, Map<St
return getMaxMemoryAllowedForMerge(context, fraction);
}
+ /**
+ * Returns {@code Option.of(per-task memory ceiling in bytes)} when the
engine exposes
+ * {@link EngineProperty#TOTAL_MEMORY_AVAILABLE}, {@link
EngineProperty#MEMORY_FRACTION_IN_USE},
+ * and {@link EngineProperty#TOTAL_CORES_PER_EXECUTOR} (Spark). Returns
{@code Option.empty()}
+ * otherwise (engines that do not expose memory/cores) so callers can fall
back to a static
+ * config-driven cap rather than the spillable-map 1GB default.
+ *
+ * <p>Formula mirrors {@link #getMaxMemoryAllowedForMerge}: {@code
userAvailableMemory =
+ * totalExecutorMemory * (1 - memoryFraction) / executorTaskNum}, then
multiplied by the
+ * supplied fraction. Honors {@link EngineProperty#SINGLE_TASK_CORES} when
present (divides
+ * cores per task into total cores to get task slots).
+ *
+ * <p>Distinct from {@link #getMaxMemoryAllowedForMerge} in two ways:
returns {@code Option}
+ * rather than collapsing the absent case to a default, and accepts the
floor as a parameter
+ * so callers (e.g., {@code HoodieAppendHandle}) can use a smaller floor
than the spillable-map
+ * 100MB.
+ */
+ public static Option<Long> getMaxMemoryAllowedForLogAppend(
Review Comment:
🤖 nit: `minFloor` is redundant since a floor is already a minimum — could
you rename to just `floor` or `minBytes`? Same applies to the docs above.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -615,12 +646,22 @@ protected boolean needsUpdateLocation() {
return true;
}
- private void writeToBuffer(HoodieRecord<T> record) {
+ /**
+ * Buffers {@code record} for the next log block. Returns the record
actually appended to
+ * {@link #recordList} — the post-{@code prepareRecord} clone of {@code
populatedRecord}
+ * carrying a fully-materialized Avro {@code IndexedRecord} with prepended
meta-fields —
+ * or {@code null} when nothing was appended to {@code recordList} (delete,
ignored, or
+ * partition-mismatch failure). The returned reference is what {@link
#flushToDiskIfRequired}
+ * sizes; sizing the incoming record under-counts heap because the incoming
payload is
+ * typically still in its compact/deflated wire form, whereas the buffered
record holds
+ * the fully-deserialized Avro graph plus meta-fields.
Review Comment:
🤖 nit: this javadoc (and the similar ones on
`bufferRecord`/`bufferInsertAndUpdate`/`flushToDiskIfRequired`) is quite long
and leans on internal implementation details (`prepareRecord` clones, deflated
payloads, etc.). Could you trim to a 1-2 line contract + a brief "why" note?
Future refactors will silently drift from these descriptions.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -691,6 +752,42 @@ protected void flushToDiskIfRequired(HoodieRecord record,
boolean appendDeleteBl
}
}
+ @VisibleForTesting
+ void setSizeEstimator(SizeEstimator<HoodieRecord> sizeEstimator) {
+ this.sizeEstimator = sizeEstimator;
+ }
+
+ @VisibleForTesting
+ long getAverageRecordSize() {
+ return averageRecordSize;
+ }
+
+ @VisibleForTesting
+ long getNumberOfRecords() {
+ return numberOfRecords;
+ }
+
+ @VisibleForTesting
+ long getEstimatedNumberOfBytesWritten() {
+ return estimatedNumberOfBytesWritten;
+ }
+
+ @VisibleForTesting
+ void simulateBufferedRecord(HoodieRecord bufferedRecord) {
Review Comment:
🤖 nit: `simulateBufferedRecord` is purely a test fixture (increments
`numberOfRecords` then calls the gate). Could you keep this logic inside the
test's `TestableAppendHandle` subclass rather than the production class? The
other `@VisibleForTesting` accessors here are simple getters/setters, but this
one bakes test orchestration into the production API.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]