suryaprasanna commented on code in PR #17942:
URL: https://github.com/apache/hudi/pull/17942#discussion_r2750378830
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -201,6 +201,16 @@ public class HoodieCleanConfig extends HoodieConfig {
+ "table receives updates/deletes. Another reason to turn this on,
would be to ensure data residing in bootstrap "
+ "base files are also physically deleted, to comply with data
privacy enforcement processes.");
+ public static final ConfigProperty<Boolean>
USE_LOCAL_ENGINE_FOR_METADATA_NON_PARTITIONED_DATASETS = ConfigProperty
+
.key("hoodie.clean.planner.use.local.engine.on.metadata.and.non-partitioned.tables")
+ .defaultValue(true)
+ .markAdvanced()
+ .sinceVersion("1.2.0")
+ .withDocumentation("Some datasets have huge record_index partition,
listing this partition is causing OOM errors on clean planner. "
+ + "So, if we increase executor memory it will increase for all the
executors. So, by passing local engine context to clean "
+ + "planner, listing is done on the driver. In that case, if OOM
error were come we can increase driver memory alone to "
Review Comment:
@danny0405 thank you for reviewing the PR. If data table is
non-partitioned, then also we can use it. Don't want to increase the scope
beyond that for main table. As it can backfire, if we are doing entire clean
planning on the driver and people dont may not even consider something like
this happen.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]