linliu-code commented on code in PR #9809:
URL: https://github.com/apache/hudi/pull/9809#discussion_r1344411636
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordMerger.java:
##########
@@ -46,6 +46,28 @@ public interface HoodieRecordMerger extends Serializable {
*/
Option<Pair<HoodieRecord, Schema>> merge(HoodieRecord older, Schema
oldSchema, HoodieRecord newer, Schema newSchema, TypedProperties props) throws
IOException;
+
+ /**
+ * In some cases a business logic does some checks before flushing a merged
record to the disk.
+ * This method does the check and the returned value contains two boolean
variables.
+ * <p>
+ * The first variable indicates if the merged record should be flushed to
the disk or not.
+ * The second variable takes effect only when the first one is false, and it
indicates if
+ * the old record should be kept or not. That is,
+ * (1) (true, _): the merged one is flushed to the disk; the old record is
skipped.
+ * (2) (false, false): both records skipped, a delete operation.
+ * (3) (false, true): only the old record flushed to the disk.
+ *
+ * @param record the merged record.
+ * @param schema the schema of the merged record.
+ * @return a pair of boolean variables to indicate the flush decision.
+ *
+ * <p> This interface is experimental and might be evolved in the future.
+ **/
+ default Pair<Boolean, Boolean> shouldFlush(HoodieRecord record, Schema
schema, TypedProperties props) throws IOException {
Review Comment:
> > This question could be very critical,
>
> I didn't see such request from any user, even for the contributor from
Kuaishou, they just want to keep the merged record or drop it totally. Let's
not introduce new semantics if there is no real use case as back-up.
>
> We can evolve the returned value as a `Pair` or `Enum` if there are more
feedbacks, at this time point, the behavior for keeping the old record seems
not clear to me.
Even in current implementation of `HoodieMergeHandle`, we are still facing
this problem: when the `shouldFlush` function returns false, should we return
true or false in `writeRecord` function? Returning true means skipping the old
record, false means keeping the old record. No matter which one we choose in
advance, we still face the possible situation: what if a user wants the other
way?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]