yihua commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1156364101
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -147,17 +129,16 @@ public class HoodieCleanConfig extends HoodieConfig {
public static final ConfigProperty<String> FAILED_WRITES_CLEANER_POLICY =
ConfigProperty
.key("hoodie.cleaner.policy.failed.writes")
.defaultValue(HoodieFailedWritesCleaningPolicy.EAGER.name())
+ .withEnumDocumentation(HoodieFailedWritesCleaningPolicy.class,
+ "note that LAZY policy is required when multi-writers are enabled.")
Review Comment:
nit: capitalize the first letter.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -83,8 +79,8 @@ public class HoodieBootstrapConfig extends HoodieConfig {
public static final ConfigProperty<String> KEYGEN_TYPE = ConfigProperty
.key("hoodie.bootstrap.keygen.type")
.defaultValue(KeyGeneratorType.SIMPLE.name())
- .sinceVersion("0.9.0")
- .withDocumentation("Type of build-in key generator, currently support
SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE");
+ .withEnumDocumentation(KeyGeneratorType.class, "Key generator class for
bootstrap")
Review Comment:
For the second argument, is the convention to add a period (`.`) at the end
or not? I see both in different enum configs.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -3028,11 +2993,11 @@ private void validate() {
Objects.requireNonNull(writeConfig.getString(BASE_PATH));
if (writeConfig.isEarlyConflictDetectionEnable()) {
checkArgument(writeConfig.getString(WRITE_CONCURRENCY_MODE)
-
.equalsIgnoreCase(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.value()),
+
.equals(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name()),
Review Comment:
Same here, could we ignore case as before?
##########
hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java:
##########
@@ -139,6 +144,52 @@ public ConfigProperty<T> withDocumentation(String doc) {
return new ConfigProperty<>(key, defaultValue, docOnDefaultValue, doc,
sinceVersion, deprecatedVersion, inferFunction, validValues, advanced,
alternatives);
}
+ public <U extends Enum<U>> ConfigProperty<T> withEnumDocumentation(Class<U>
e) {
+ return withEnumDocumentation(e,"");
+ }
+
+ private <U extends Enum<U>> boolean isDefaultField(Class<U> e, Field f) {
+ if (!hasDefaultValue()) {
+ return false;
+ }
+ if (defaultValue() instanceof String) {
+ return f.getName().equals(defaultValue());
+ }
+ return Enum.valueOf(e, f.getName()).equals(defaultValue());
+ }
+
+ public <U extends Enum<U>> ConfigProperty<T> withEnumDocumentation(Class<U>
e, String doc, String... internalOption) {
Review Comment:
Could we rename this as `withDocumentation` and remove `doc` and
`internalOption` for simplicity? `doc` content can be merged to
`@EnumDescription` . We can mark internal options in the docs.
##########
hudi-common/src/main/java/org/apache/hudi/common/util/queue/DisruptorWaitStrategyType.java:
##########
@@ -27,35 +30,50 @@
/**
* Enum for the type of waiting strategy in Disruptor Queue.
*/
+@EnumDescription("Type of waiting strategy in the Disruptor Queue")
Review Comment:
We can keep the docs the same as before for now. Any docs improvement can
be in a separate PR.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -55,12 +52,10 @@ public class HoodieBootstrapConfig extends HoodieConfig {
public static final ConfigProperty<String> PARTITION_SELECTOR_REGEX_MODE =
ConfigProperty
.key("hoodie.bootstrap.mode.selector.regex.mode")
- .defaultValue(METADATA_ONLY.name())
- .sinceVersion("0.6.0")
- .withValidValues(METADATA_ONLY.name(), FULL_RECORD.name())
- .withDocumentation("Bootstrap mode to apply for partition paths, that
match regex above. "
- + "METADATA_ONLY will generate just skeleton base files with
keys/footers, avoiding full cost of rewriting the dataset. "
- + "FULL_RECORD will perform a full copy/rewrite of the data as a
Hudi table.");
Review Comment:
@jonvex I think @lokeshj1703 means that `avoiding full cost of rewriting the
dataset` is missing in the new docs to indicate the benefit.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -83,8 +79,8 @@ public class HoodieBootstrapConfig extends HoodieConfig {
public static final ConfigProperty<String> KEYGEN_TYPE = ConfigProperty
.key("hoodie.bootstrap.keygen.type")
.defaultValue(KeyGeneratorType.SIMPLE.name())
- .sinceVersion("0.9.0")
- .withDocumentation("Type of build-in key generator, currently support
SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE");
+ .withEnumDocumentation(KeyGeneratorType.class, "Key generator class for
bootstrap")
Review Comment:
Also, should all the descriptions be in the `@EnumDescription`, instead of
having a second argument here for additional description?
##########
docker/demo/config/test-suite/multi-writer-1.properties:
##########
@@ -35,7 +35,7 @@ hoodie.datasource.write.recordkey.field=_row_key
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
hoodie.datasource.write.partitionpath.field=timestamp
-hoodie.write.concurrency.mode=optimistic_concurrency_control
+hoodie.write.concurrency.mode=OPTIMISTIC_CONCURRENCY_CONTROL
Review Comment:
Is upper-case config value required? Can you still ensure the lower-case
value `optimistic_concurrency_control` works as before?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -194,16 +188,18 @@ public class HoodieWriteConfig extends HoodieConfig {
public static final ConfigProperty<String> TIMELINE_LAYOUT_VERSION_NUM =
ConfigProperty
.key("hoodie.timeline.layout.version")
- .defaultValue(Integer.toString(TimelineLayoutVersion.VERSION_1))
+ .defaultValue(Integer.toString(TimelineLayoutVersion.CURR_VERSION))
+
.withValidValues(Integer.toString(TimelineLayoutVersion.VERSION_0),Integer.toString(TimelineLayoutVersion.VERSION_1))
.sinceVersion("0.5.1")
.withDocumentation("Controls the layout of the timeline. Version 0
relied on renames, Version 1 (default) models "
+ "the timeline as an immutable log relying only on atomic writes
for object storage.");
public static final ConfigProperty<HoodieFileFormat> BASE_FILE_FORMAT =
ConfigProperty
.key("hoodie.table.base.file.format")
.defaultValue(HoodieFileFormat.PARQUET)
- .withAlternatives("hoodie.table.ro.file.format")
- .withDocumentation("Base file format to store all the base file data.");
+ .withValidValues(HoodieFileFormat.PARQUET.name(),
HoodieFileFormat.ORC.name(), HoodieFileFormat.HFILE.name())
+ .withEnumDocumentation(HoodieFileFormat.class, "File format to store all
the base file data.", "HFILE")
Review Comment:
Why do we have a third argument `"HFILE"` here?
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java:
##########
@@ -265,4 +265,5 @@ public <T> String getStringOrThrow(ConfigProperty<T>
configProperty, String erro
throw new HoodieException(errorMessage);
}
}
+
Review Comment:
nit: revert the new line
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -89,24 +89,7 @@ public class HoodieCleanConfig extends HoodieConfig {
return
Option.of(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name());
}
return Option.empty();
- })
- .withDocumentation("Cleaning policy to be used. The cleaner service
deletes older file "
Review Comment:
There are new docs added to this config. Could you move them to the right
place?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -277,11 +266,9 @@ public class HoodieClusteringConfig extends HoodieConfig {
*/
public static final ConfigProperty<String>
LAYOUT_OPTIMIZE_SPATIAL_CURVE_BUILD_METHOD = ConfigProperty
.key(LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.build.method")
- .defaultValue("direct")
- .sinceVersion("0.10.0")
- .withDocumentation("Controls how data is sampled to build the
space-filling curves. "
- + "Two methods: \"direct\", \"sample\". The direct method is faster
than the sampling, "
- + "however sample method would produce a better data layout.");
+ .defaultValue(SpatialCurveCompositionStrategyType.DIRECT.name())
Review Comment:
similar here, we should make sure both lower- and upper-case names should
work.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -677,95 +664,43 @@ private String
getDefaultExecutionStrategyClassName(EngineType engineType) {
/**
* Type of a strategy for building Z-order/Hilbert space-filling curves.
*/
+ @EnumDescription("Type of a strategy for building Z-order/Hilbert
space-filling curves.")
public enum SpatialCurveCompositionStrategyType {
- DIRECT("direct"),
- SAMPLE("sample");
- private static final Map<String, SpatialCurveCompositionStrategyType>
VALUE_TO_ENUM_MAP =
- TypeUtils.getValueToEnumMap(SpatialCurveCompositionStrategyType.class,
e -> e.value);
-
- private final String value;
-
- SpatialCurveCompositionStrategyType(String value) {
- this.value = value;
- }
+ @EnumFieldDescription("Faster than sampling, but produces a worse(?)
layout. NEEDS BETTER DESCRIPTION")
+ DIRECT,
- public static SpatialCurveCompositionStrategyType fromValue(String value) {
- SpatialCurveCompositionStrategyType enumValue =
VALUE_TO_ENUM_MAP.get(value);
- if (enumValue == null) {
- throw new HoodieException(String.format("Invalid value (%s)", value));
- }
-
- return enumValue;
- }
+ @EnumFieldDescription("Slower than sampling, but produces a better(?)
layout. NEEDS BETTER DESCRIPTION")
+ SAMPLE
}
/**
* Layout optimization strategies such as Z-order/Hilbert space-curves, etc
*/
+ @EnumDescription("Determines ordering strategy for records layout
optimization")
public enum LayoutOptimizationStrategy {
- LINEAR("linear"),
- ZORDER("z-order"),
- HILBERT("hilbert");
-
- private static final Map<String, LayoutOptimizationStrategy>
VALUE_TO_ENUM_MAP =
- TypeUtils.getValueToEnumMap(LayoutOptimizationStrategy.class, e ->
e.value);
-
- private final String value;
-
- LayoutOptimizationStrategy(String value) {
- this.value = value;
- }
- @Nonnull
- public static LayoutOptimizationStrategy fromValue(String value) {
- LayoutOptimizationStrategy enumValue = VALUE_TO_ENUM_MAP.get(value);
- if (enumValue == null) {
- throw new HoodieException(String.format("Invalid value (%s)", value));
- }
+ @EnumFieldDescription("Order records lexicographically.")
+ LINEAR,
- return enumValue;
- }
+ @EnumFieldDescription("Order records along Z-order spatial-curve")
+ ZORDER,
- public String getValue() {
- return value;
- }
+ @EnumFieldDescription("Order records along Hilbert's spatial-curve")
+ HILBERT
}
+ @EnumDescription("Clustering Operator ADD DESCRIPTION")
Review Comment:
docs: `Clustering mode to use`
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleaningTriggerStrategy.java:
##########
@@ -18,7 +18,14 @@
package org.apache.hudi.table.action.clean;
+import org.apache.hudi.common.config.EnumDefault;
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
+@EnumDescription("Controls when cleaning is scheduled")
public enum CleaningTriggerStrategy {
Review Comment:
Let's keep it. It is meant for extensibility.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]