vinothchandar commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r891876288
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -123,6 +124,12 @@ public class HoodieWriteConfig extends HoodieConfig {
.withDocumentation("Payload class used. Override this, if you like to
roll your own merge logic, when upserting/inserting. "
+ "This will render any value set for PRECOMBINE_FIELD_OPT_VAL
in-effective");
+ public static final ConfigProperty<String> COMBINE_ENGINE_CLASS_NAME =
ConfigProperty
Review Comment:
Can we make it such that we fallback to Avro based merging (and thus support
existing payloads but with a potential performance penalty) - if a engine
optimized implementation does not exist. ? e.g merging using Spark's row objects
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -234,6 +235,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
+ "the record payload class to merge records in the log against each
other, merge again with the base file and "
+ "produce the final record to be written after compaction.");
+ public static final ConfigProperty<String> COMBINE_ENGINE_CLASS_NAME =
ConfigProperty
+ .key("hoodie.compaction.combine.engine.class")
+ .defaultValue(HoodieAvroRecordCombiningEngine.class.getName())
+ .withDocumentation("Combine engine class to use for performing
compactions, i.e merge delta logs with current base file and then "
Review Comment:
This needs to be made clearer IMO. Can we make the doc more about what a
"combine engine" is, rather than explaining what compaction does
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordCombiningEngine.java:
##########
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Properties;
+
+public interface HoodieRecordCombiningEngine extends Serializable {
Review Comment:
Can we call this `HoodieMerge` instead ? IIUC this is the new API that
replaces payload? then I would love some higher level discussions on the RFC
first.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -154,6 +155,12 @@ public class HoodieTableConfig extends HoodieConfig {
.withDocumentation("Payload class to use for performing compactions, i.e
merge delta logs with current base file and then "
+ " produce a new base file.");
+ public static final ConfigProperty<String> COMBINE_ENGINE_CLASS_NAME =
ConfigProperty
+ .key("hoodie.compaction.combine.engine.class")
+ .defaultValue(HoodieAvroRecordCombiningEngine.class.getName())
Review Comment:
so the same table be read by Spark or Hive for e.g I would love for such
queries to be able to read and merge using the engine' POJO (Spark -> Row, Hive
-> ArrayWritable), if the user chooses to write such a "combining engine"
implementation. My thought here is that we should be able to discover whether
such ability exists for class here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]