hudi-agent commented on code in PR #18860:
URL: https://github.com/apache/hudi/pull/18860#discussion_r3431469556


##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMemoryConfig.java:
##########
@@ -64,6 +64,22 @@ public class HoodieMemoryConfig extends HoodieConfig {
   public static final long DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES = 
HoodieCommonConfig.DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
   // Minimum memory size (100MB) for the spillable map.
   public static final long DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES = 100 
* 1024 * 1024L;
+  // Minimum memory size (16MB) for the HoodieAppendHandle in-memory log-block 
buffer.
+  // Floor is smaller than the spillable-map floor: the append-handle buffer 
is per-handle and
+  // many handles may be active concurrently in a task, so we tolerate smaller 
blocks on tight
+  // executors to avoid OOM.
+  public static final long MIN_MEMORY_FOR_LOG_APPEND_BUFFER_IN_BYTES = 16 * 
1024 * 1024L;
+
+  public static final ConfigProperty<String> 
MAX_MEMORY_FRACTION_FOR_LOG_APPEND = ConfigProperty
+      .key("hoodie.memory.logfile.append.fraction")
+      .defaultValue(String.valueOf(0.6))

Review Comment:
   🤖 nit: could you use `"0.6"` directly instead of `String.valueOf(0.6)`? 
Every other Hudi config default that's a numeric string uses a string literal, 
and wrapping a literal double in `String.valueOf` is a bit noisy here.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/IOUtils.java:
##########
@@ -98,6 +98,45 @@ public static long 
getMaxMemoryPerCompaction(TaskContextSupplier context, Map<St
     return getMaxMemoryAllowedForMerge(context, fraction);
   }
 
+  /**
+   * Returns {@code Option.of(per-task memory ceiling in bytes)} when the 
engine exposes
+   * {@link EngineProperty#TOTAL_MEMORY_AVAILABLE}, {@link 
EngineProperty#MEMORY_FRACTION_IN_USE},
+   * and {@link EngineProperty#TOTAL_CORES_PER_EXECUTOR} (Spark). Returns 
{@code Option.empty()}
+   * otherwise (engines that do not expose memory/cores) so callers can fall 
back to a static
+   * config-driven cap rather than the spillable-map 1GB default.
+   *
+   * <p>Formula mirrors {@link #getMaxMemoryAllowedForMerge}: {@code 
userAvailableMemory =
+   * totalExecutorMemory * (1 - memoryFraction) / executorTaskNum}, then 
multiplied by the
+   * supplied fraction. Honors {@link EngineProperty#SINGLE_TASK_CORES} when 
present (divides
+   * cores per task into total cores to get task slots).
+   *
+   * <p>Distinct from {@link #getMaxMemoryAllowedForMerge} in two ways: 
returns {@code Option}
+   * rather than collapsing the absent case to a default, and accepts the 
minimum-bytes value
+   * as a parameter so callers (e.g., {@code HoodieAppendHandle}) can use a 
smaller floor than
+   * the spillable-map 100MB.
+   */
+  public static Option<Long> getMaxMemoryAllowedForLogAppend(
+      TaskContextSupplier context, String maxMemoryFraction, long minBytes) {
+    Option<String> totalMemoryOpt = 
context.getProperty(EngineProperty.TOTAL_MEMORY_AVAILABLE);
+    Option<String> memoryFractionOpt = 
context.getProperty(EngineProperty.MEMORY_FRACTION_IN_USE);
+    Option<String> totalCoresOpt = 
context.getProperty(EngineProperty.TOTAL_CORES_PER_EXECUTOR);
+    if (!(totalMemoryOpt.isPresent() && memoryFractionOpt.isPresent() && 
totalCoresOpt.isPresent())) {
+      return Option.empty();
+    }
+    long executorMemoryInBytes = Long.parseLong(totalMemoryOpt.get());
+    double memoryFraction = Double.parseDouble(memoryFractionOpt.get());
+    double appendFraction = Double.parseDouble(maxMemoryFraction);
+    long executorCores = Long.parseLong(totalCoresOpt.get());
+    Option<String> singleTaskCoresOpt = 
context.getProperty(EngineProperty.SINGLE_TASK_CORES);

Review Comment:
   🤖 nit: `executorTaskNum` reads a bit like a historical count or queue depth 
— would `taskSlots` (or `concurrentTasks`) make the intent clearer? The 
variable represents "how many tasks can run simultaneously on this executor", 
which `taskSlots` captures more directly.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to