danny0405 commented on code in PR #13865:
URL: https://github.com/apache/hudi/pull/13865#discussion_r2335447210


##########
rfc/rfc-101/rfc-101.md:
##########
@@ -0,0 +1,71 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-101: Updates to the HoodieRecordMerger API
+
+## Proposers
+
+- @the-other-tim-brown
+
+## Approvers
+ - @<approver1 github username>
+ - @<approver2 github username>
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-9798
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+The HoodieRecordMerger interface allows users to implement custom merging 
logic for records. The interface does not provide clear instructions on how to 
handle deletions and can currently lead to differences in behavior based on 
execution engine.
+This RFC proposes updates to the interface to clarify the expectations so that 
out of order updates and deletes are handled properly across implementations.
+
+## Background
+Currently, the HoodieRecordMerger interface implementations within the Hudi 
codebase are not consistent when it comes to how deletions are handled and can 
lead to differences in the results when merging records as part of the writer 
or reader path. Also, it is unclear from the current documentation of the 
interface for how the implementation should handle deleted records.
+
+As part of the migration to the new HoodieFileGroupReader, the merging logic 
has been refactored for the common use cases by creating a BufferedRecord, 
which wraps the data and key information about it such as the ordering values 
and record key, and RecordContext which provides a common interface for 
inspecting the underlying data. The new APIs can be used to simplify the 
implementation of the HoodieRecordMerger and also help users implement custom 
mergers that are engine agnostic.
+
+## Implementation
+The HoodieRecordMerger interface will be updated to no longer rely on the 
HoodieRecord class, and instead use the `BufferedRecord` and `RecordContext` 
classes. 
+The `BufferedRecord` class contains the data as well as the record key, 
ordering value, schema identifier, and the `HoodieOperation`.
+
+The methods in the interface will be updated as follows:
+```java
+<T> BufferedRecord<T> partialMerge(BufferedRecord<T> older, BufferedRecord<T> 
newer, Schema readerSchema, RecordContext<T> recordContext, TypedProperties 
props) throws IOException
+<T> BufferedRecord merge(BufferedRecord<T> older, BufferedRecord<T> newer, 
RecordContext<T> recordContext, TypedProperties props) throws IOException;

Review Comment:
   currently this is a chaos, a lot of places getch a write config props first 
then append some required table config options, a few places just use the table 
config props. It's a good oppurtunity to unify and standardize it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to