alexeykudinkin commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r938154747
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordMerger.java:
##########
@@ -30,9 +34,19 @@
* It can implement the merging logic of HoodieRecord of different engines
* and avoid the performance consumption caused by the
serialization/deserialization of Avro payload.
*/
-public interface HoodieMerge extends Serializable {
-
- HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer);
+@PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING)
+public interface HoodieRecordMerger extends Serializable {
+
+ /**
+ * This method converges combineAndGetUpdateValue and precombine from
HoodiePayload.
+ * It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we
can translate as having 3 versions A, B, C
+ * of the single record, both orders of operations applications have to
yield the same result)
+ */
+ Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema
schema, Properties props) throws IOException;
- Option<HoodieRecord> combineAndGetUpdateValue(HoodieRecord older,
HoodieRecord newer, Schema schema, Properties props) throws IOException;
+ /**
+ * The record type handled by the current merger.
+ * SPARK, AVRO, FLINK
+ */
+ HoodieRecordType getRecordType();
Review Comment:
@wzx140 we should actually do it in a similar fashion to how `KeyGenerator`
interface is currently implemented:
- You have generic API (`KeyGenerator`) accepting Avro (this is a bare
minimum to be implemented, is used as a fallback)
- You have engine specific API (`SparkKeyGeneratorInterface`) which provides
for engine-specific APIs that you should implement natively for better
performance.
So when Acme implements their own `RecordMerger` they can choose to
implement either
1. A bare-minimum API (Avro), that would allow it to work across the engines
but will lack performance
2. Or fully (Avro, Spark, etc) which will be performant across engines
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]