lokeshj1703 commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1147219106
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/BootstrapMode.java:
##########
@@ -18,18 +18,27 @@
package org.apache.hudi.client.bootstrap;
+import org.apache.hudi.common.config.EnumDefault;
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
/**
* Identifies different types of bootstrap.
*/
+@EnumDescription("Bootstrap types")
public enum BootstrapMode {
/**
* In this mode, record level metadata is generated for each source record
and both original record and metadata
* for each record copied.
*/
+ @EnumFieldDescription("In this mode, record level metadata is generated for
each source record and both original record and metadata "
+ + "for each record copied.")
FULL_RECORD,
/**
* In this mode, record level metadata alone is generated for each source
record and stored in new bootstrap location.
*/
+ @EnumDefault
Review Comment:
I think default enum should be specified as parameter of
`enumDefaultStringValueAndDocumentation` since enum could be part of many
configs with different default values.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -55,12 +52,10 @@ public class HoodieBootstrapConfig extends HoodieConfig {
public static final ConfigProperty<String> PARTITION_SELECTOR_REGEX_MODE =
ConfigProperty
.key("hoodie.bootstrap.mode.selector.regex.mode")
- .defaultValue(METADATA_ONLY.name())
- .sinceVersion("0.6.0")
- .withValidValues(METADATA_ONLY.name(), FULL_RECORD.name())
- .withDocumentation("Bootstrap mode to apply for partition paths, that
match regex above. "
- + "METADATA_ONLY will generate just skeleton base files with
keys/footers, avoiding full cost of rewriting the dataset. "
- + "FULL_RECORD will perform a full copy/rewrite of the data as a
Hudi table.");
Review Comment:
The enum field description is missing some of the documentation mentioned
here? Is this intentional?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -2929,12 +2892,12 @@ private void
autoAdjustConfigsForConcurrencyMode(boolean isLockProviderPropertyS
// If user does not set the lock provider, likely that the
concurrency mode is not set either
// Override the configs for metadata table
writeConfig.setValue(WRITE_CONCURRENCY_MODE.key(),
- WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.value());
Review Comment:
This would be a breaking change right? We are changing the possible config
values.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -1912,6 +1871,10 @@ public Option<HoodieLogBlock.HoodieLogBlockType>
getLogDataBlockFormat() {
.map(HoodieLogBlock.HoodieLogBlockType::fromId);
}
+ public void testEnums() {
Review Comment:
Can we add a test to `TestConfigProperty` which does a string comparison for
documentation generated?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/BootstrapMode.java:
##########
@@ -18,18 +18,27 @@
package org.apache.hudi.client.bootstrap;
+import org.apache.hudi.common.config.EnumDefault;
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
/**
* Identifies different types of bootstrap.
*/
+@EnumDescription("Bootstrap types")
Review Comment:
Do we need to add more information here?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -64,25 +62,20 @@ public class HoodieCleanConfig extends HoodieConfig {
.withDocumentation("Number of commits to retain, without cleaning. This
will be retained for num_of_commits * time_between_commits "
+ "(scheduled). This also directly translates into how much data
retention the table supports for incremental queries.");
- public static final ConfigProperty<String> CLEANER_HOURS_RETAINED =
ConfigProperty.key("hoodie.cleaner.hours.retained")
+ public static final ConfigProperty<String> CLEANER_HOURS_RETAINED =
ConfigProperty
+ .key("hoodie.cleaner.hours.retained")
.defaultValue("24")
.withDocumentation("Number of hours for which commits need to be
retained. This config provides a more flexible option as"
+ "compared to number of commits retained for cleaning service.
Setting this property ensures all the files, but the latest in a file group,"
+ " corresponding to commits with commit times older than the
configured number of hours to be retained are cleaned.");
public static final ConfigProperty<String> CLEANER_POLICY = ConfigProperty
.key("hoodie.cleaner.policy")
- .defaultValue(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name())
- .withDocumentation("Cleaning policy to be used. The cleaner service
deletes older file slices files to re-claim space."
- + " By default, cleaner spares the file slices written by the last N
commits, determined by " + CLEANER_COMMITS_RETAINED.key()
- + " Long running query plans may often refer to older file slices
and will break if those are cleaned, before the query has had"
- + " a chance to run. So, it is good to make sure that the data is
retained for more than the maximum query execution time");
Review Comment:
Similar to above this is not mentioned in new documentation. Can we check
the documentation for other configs as well? Or specify the intentional changes
through github comments?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -692,9 +687,17 @@ public static SpatialCurveCompositionStrategyType
fromValue(String value) {
/**
* Layout optimization strategies such as Z-order/Hilbert space-curves, etc
*/
+ @EnumDescription("Determines ordering strategy for records layout
optimization")
public enum LayoutOptimizationStrategy {
Review Comment:
Do you think approach like `ErrorWriteFailureStrategy` could be helpful here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]