nsivabalan commented on code in PR #13526: URL: https://github.com/apache/hudi/pull/13526#discussion_r2193964054
########## hudi-common/src/main/java/org/apache/hudi/Comparables.java: ########## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi; + +import org.apache.hudi.common.util.ValidationUtils; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.model.HoodieRecord.DEFAULT_ORDERING_VALUE; + +public class Comparables implements Comparable, Serializable { + protected static final long serialVersionUID = 1L; + + private final List<Comparable> comparables; Review Comment: Instead of maintaining everything as a list even for single element, can we maintain two instance variables. one for single element and the other as list. and a private boolean flag to denote whether this is single element or multiple elements. lets avoid maintaining a list for single element ordering fields which will be most common use-case. ########## hudi-common/src/main/java/org/apache/hudi/Comparables.java: ########## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi; + +import org.apache.hudi.common.util.ValidationUtils; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.model.HoodieRecord.DEFAULT_ORDERING_VALUE; + +public class Comparables implements Comparable, Serializable { + protected static final long serialVersionUID = 1L; + + private final List<Comparable> comparables; Review Comment: we can introduce two diff static methods to assist w/ creating Comparables either w/ single Comparable or a list of Comparable ########## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ########## @@ -900,8 +900,13 @@ static RecordMergeMode inferRecordMergeModeFromMergeStrategyId(String recordMerg } } - public String getPreCombineField() { - return getString(PRECOMBINE_FIELD); + public Option<List<String>> getPreCombineFieldList() { Review Comment: ok, I see there are diff set of users. one of them just tries to intercept the config value and sets it in some other props to determines the merge mode/payload etc. while the other set of callers are actually using it for merging which needs a list. we can keep both methods then. but lets ensure we call `getPreCombineFields` from within `getPreCombineFieldList` and then convert to list. Also, we can trim for empty string. `filter(StringUtils::nonEmpty)` before returning. ########## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ########## @@ -900,8 +900,13 @@ static RecordMergeMode inferRecordMergeModeFromMergeStrategyId(String recordMerg } } - public String getPreCombineField() { - return getString(PRECOMBINE_FIELD); + public Option<List<String>> getPreCombineFieldList() { Review Comment: why do we need two methods(getPreCombineFieldList and getPreCombineFields). lets try to have just 1 method and fix callers to use the same. ########## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala: ########## @@ -127,9 +128,9 @@ class HoodieCatalogTable(val spark: SparkSession, var table: CatalogTable) exten lazy val primaryKeys: Array[String] = tableConfig.getRecordKeyFields.orElse(Array.empty) /** - * PreCombine Field + * Comparables Field */ - lazy val preCombineKey: Option[String] = Option(tableConfig.getPreCombineField) + lazy val preCombineKey: Option[String] = convertHudiOptionToScalaOption(tableConfig.getPreCombineFields) Review Comment: `preCombineKeys` ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToEightUpgradeHandler.java: ########## @@ -175,7 +175,7 @@ static void upgradePartitionFields(HoodieWriteConfig config, HoodieTableConfig t static void upgradeMergeMode(HoodieTableConfig tableConfig, Map<ConfigProperty, String> tablePropsToAdd) { String payloadClass = tableConfig.getPayloadClass(); - String preCombineField = tableConfig.getPreCombineField(); + String preCombineField = tableConfig.getPreCombineFields().orElse(null); Review Comment: `preCombineFields` ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ########## @@ -169,8 +170,9 @@ public class HoodieWriteConfig extends HoodieConfig { public static final ConfigProperty<String> PRECOMBINE_FIELD_NAME = ConfigProperty .key("hoodie.datasource.write.precombine.field") .noDefaultValue() - .withDocumentation("Field used in preCombining before actual write. When two records have the same key value, " - + "we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)"); + .withDocumentation("Comma separated list of fields used in preCombining before actual write. When two records have the same key value, " Review Comment: can we enhance the docs to call out this is not just for combining records w/n same batch. but also used w/ merge mode like EventTime based merge mode. ########## hudi-common/src/main/java/org/apache/hudi/Comparables.java: ########## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi; + +import org.apache.hudi.common.util.ValidationUtils; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.model.HoodieRecord.DEFAULT_ORDERING_VALUE; + +public class Comparables implements Comparable, Serializable { + protected static final long serialVersionUID = 1L; + + private final List<Comparable> comparables; + + public Comparables(List<Comparable> comparables) { + this.comparables = comparables; + } + + public Comparables(Comparable comparable) { + this.comparables = Collections.singletonList(comparable); + } + + public static boolean isDefault(Comparable orderingVal) { Review Comment: Can we move static methods to the end of the class ########## hudi-common/src/main/java/org/apache/hudi/Comparables.java: ########## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi; + +import org.apache.hudi.common.util.ValidationUtils; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.model.HoodieRecord.DEFAULT_ORDERING_VALUE; Review Comment: java doc please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
