vinothchandar commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r891876288


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -123,6 +124,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .withDocumentation("Payload class used. Override this, if you like to 
roll your own merge logic, when upserting/inserting. "
           + "This will render any value set for PRECOMBINE_FIELD_OPT_VAL 
in-effective");
 
+  public static final ConfigProperty<String> COMBINE_ENGINE_CLASS_NAME = 
ConfigProperty

Review Comment:
   Can we make it such that we fallback to Avro based merging (and thus support 
existing payloads but with a potential performance penalty) - if a engine 
optimized implementation does not exist. ? e.g merging using Spark's row objects
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -234,6 +235,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
           + "the record payload class to merge records in the log against each 
other, merge again with the base file and "
           + "produce the final record to be written after compaction.");
 
+  public static final ConfigProperty<String> COMBINE_ENGINE_CLASS_NAME = 
ConfigProperty
+      .key("hoodie.compaction.combine.engine.class")
+      .defaultValue(HoodieAvroRecordCombiningEngine.class.getName())
+      .withDocumentation("Combine engine class to use for performing 
compactions, i.e merge delta logs with current base file and then "

Review Comment:
   This needs to be made clearer IMO. Can we make the doc more about what a 
"combine engine" is, rather than explaining what compaction does



##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordCombiningEngine.java:
##########
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Properties;
+
+public interface HoodieRecordCombiningEngine extends Serializable {

Review Comment:
   Can we call this `HoodieMerge` instead ? IIUC this is the new API that 
replaces payload? then I would love some higher level discussions on the RFC 
first. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -154,6 +155,12 @@ public class HoodieTableConfig extends HoodieConfig {
       .withDocumentation("Payload class to use for performing compactions, i.e 
merge delta logs with current base file and then "
           + " produce a new base file.");
 
+  public static final ConfigProperty<String> COMBINE_ENGINE_CLASS_NAME = 
ConfigProperty
+      .key("hoodie.compaction.combine.engine.class")
+      .defaultValue(HoodieAvroRecordCombiningEngine.class.getName())

Review Comment:
   so the same table be read by Spark or Hive for e.g I would love for such 
queries to be able to read and merge using the engine' POJO (Spark -> Row, Hive 
-> ArrayWritable), if the user chooses to write such a "combining engine" 
implementation. My thought here is that we should be able to discover whether 
such ability exists for class here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to