Davis-Zhang-Onehouse opened a new pull request, #18824:
URL: https://github.com/apache/hudi/pull/18824

   ### Describe the issue this Pull Request addresses
   
   `HoodieMetadataWriteUtils.createMetadataWriteConfig` builds the MDT 
`HoodieWriteConfig` from scratch and does not copy `HoodieCommonConfig` / 
`HoodieMemoryConfig` values from the data-table write config. As a result, user 
overrides in the spillable-map config family take effect for the data-table 
writer but are silently ignored by the MDT writer/compactor.
   
   The most visible symptom is that 
`hoodie.common.spillable.diskmap.type=ROCKS_DB` has no effect on MDT 
compaction. `HoodieCompactor` reads the value via 
`config.getCommonConfig().getSpillableDiskMapType()`, and `commonConfig` on the 
MDT `HoodieWriteConfig` is the default-only instance. MDT compaction tasks 
therefore continue to use a BITCASK `ExternalSpillableMap` and can stall in 
`BitCaskDiskMap$CompressionHandler.decompressBytes` during merges even after 
operators apply the override at the table level.
   
   ### Summary and Changelog
   
   Inherit the configs that drive the MDT writer's `ExternalSpillableMap` and 
`HoodieMergedLogRecordScanner`:
   
   | Key | Defined in |
   |-----|-----------|
   | `hoodie.common.spillable.diskmap.type` | HoodieCommonConfig |
   | `hoodie.common.diskmap.compression.enabled` | HoodieCommonConfig |
   | `hoodie.memory.spillable.map.path` | HoodieMemoryConfig |
   | `hoodie.memory.compaction.max.size` | HoodieCommonConfig / 
HoodieMemoryConfig |
   | `hoodie.memory.compaction.fraction` | HoodieMemoryConfig |
   | `hoodie.memory.merge.max.size` | HoodieMemoryConfig |
   | `hoodie.memory.merge.fraction` | HoodieMemoryConfig |
   | `hoodie.memory.dfs.buffer.max.size` | HoodieCommonConfig / 
HoodieMemoryConfig |
   
   Propagation is implemented as a static `MDT_INHERITED_SPILLABLE_MAP_CONFIGS` 
list driving a `containsKey`-guarded copy loop. The guard matters for the two 
`noDefaultValue()` configs (`SPILLABLE_MAP_BASE_PATH` and 
`MAX_MEMORY_FOR_COMPACTION`): without it, the MDT side would observe a 
"user-set" value when the data-table side actually relies on 
`IOUtils.getMaxMemoryPerCompaction`'s fraction fallback or the inferred default 
spill path, breaking that fallback chain.
   
   This mirrors the precedent set by `fix(metadata): propagate timeline server 
config from main dataset to metadata (#17486)`.
   
   ### Impact
   
   No new configs; no default value changes; no public API change. Behavior 
change is limited to MDT writers when the user has set one of the eight keys 
above: previously the MDT writer silently ignored the override and used the 
default, now it honors the user value.
   
   ### Risk Level
   
   low — the propagation is scoped to a small explicit list of configs, 
defaults remain unchanged, and an opt-in (no-override) path is preserved via 
the `containsKey` guard. Existing `TestHoodieMetadataWriteUtils` tests continue 
to pass (19/19).
   
   ### Documentation Update
   
   none — existing config docs already describe the user-facing keys; this PR 
just makes the MDT writer honor them.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added 
(`TestHoodieMetadataWriteUtils#testSpillableMapConfigPropagation`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to