nsivabalan commented on code in PR #18174:
URL: https://github.com/apache/hudi/pull/18174#discussion_r2830901657
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -217,6 +217,14 @@ public class HoodieClusteringConfig extends HoodieConfig {
.sinceVersion("0.14.0")
.withDocumentation("Whether to generate clustering plan when there is
only one file group involved, by default true");
+ public static final ConfigProperty<Boolean> EARLIER_INSTANTS_FIRST =
ConfigProperty
+ .key("hoodie.clustering.earlier_instants_first")
Review Comment:
can we rename the config key to
`hoodie.clustering.plan.strategy.file.slices.sort.order` ?
and default value for the config would be`size`
and another option is `instant_time, size`.
what do you think of this, instead of a boolean ?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -584,6 +592,11 @@ public Builder withClusteringTargetFileMaxBytes(long
targetFileSize) {
return this;
}
+ public Builder withEarlierInstantsFirst(Boolean earlierInstantsFirst) {
Review Comment:
we might need to change this based on the proposed feedback above
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java:
##########
@@ -67,11 +68,17 @@ protected Pair<Stream<HoodieClusteringGroup>, Boolean>
buildClusteringGroupsForP
List<Pair<List<FileSlice>, Integer>> fileSliceGroups = new ArrayList<>();
List<FileSlice> currentGroup = new ArrayList<>();
- // Sort fileSlices before dividing, which makes dividing more compact
+ // Sort file slices by instant time (when earlier-instants-first) and size
before dividing,
+ // so that older data files are clustered first and dividing is more
compact
+ Comparator<FileSlice> sortedFileSlicesComparator = Comparator
Review Comment:
can we neatly introduce two Comparators and use them here rather than
embedding it inline?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -217,6 +217,14 @@ public class HoodieClusteringConfig extends HoodieConfig {
.sinceVersion("0.14.0")
.withDocumentation("Whether to generate clustering plan when there is
only one file group involved, by default true");
+ public static final ConfigProperty<Boolean> EARLIER_INSTANTS_FIRST =
ConfigProperty
+ .key("hoodie.clustering.earlier_instants_first")
+ .defaultValue(false)
+ .markAdvanced()
+ .sinceVersion("0.14.0")
Review Comment:
fix since version.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]