[
https://issues.apache.org/jira/browse/HUDI-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899563#comment-17899563
]
Y Ethan Guo commented on HUDI-8521:
-----------------------------------
h2. How merger is used in HoodieAvroRecordMerger and log record scanner
It's used in #processNextRecord. "records" is a spillable map storing merged
records from log files. If we assume the log file does not have duplicate keys,
which is the case (writer dedups the input batch), the merge call here operates
on records across batches. So "preCombine" merger or logic should not be
invoked here.
{code:java}
@Override
protected <T> void processNextRecord(HoodieRecord<T> newRecord) throws
IOException {
String key = newRecord.getRecordKey();
HoodieRecord<T> prevRecord = records.get(key);
if (prevRecord != null) {
// Merge and store the combined record
HoodieRecord<T> combinedRecord = (HoodieRecord<T>)
recordMerger.merge(prevRecord, readerSchema,
newRecord, readerSchema, this.getPayloadProps()).get().getLeft();
// If pre-combine returns existing record, no need to update it
if (combinedRecord.getData() != prevRecord.getData()) {
HoodieRecord latestHoodieRecord = getLatestHoodieRecord(newRecord,
combinedRecord, key);
// NOTE: Record have to be cloned here to make sure if it holds
low-level engine-specific
// payload pointing into a shared, mutable (underlying) buffer we
get a clean copy of
// it since these records will be put into records(Map).
records.put(key, latestHoodieRecord.copy());
}
} else {
// Put the record as is
// NOTE: Record have to be cloned here to make sure if it holds low-level
engine-specific
// payload pointing into a shared, mutable (underlying) buffer we
get a clean copy of
// it since these records will be put into records(Map).
records.put(key, newRecord.copy());
}
} {code}
> Resolve issues w/ diff merge modes
> -----------------------------------
>
> Key: HUDI-8521
> URL: https://issues.apache.org/jira/browse/HUDI-8521
> Project: Apache Hudi
> Issue Type: Bug
> Components: reader-core, writer-core
> Reporter: sivabalan narayanan
> Assignee: Lin Liu
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.0.0
>
>
> we found some issues w/ merge mode feature in 1.x.
> we need to triage them and fix
--
This message was sent by Atlassian Jira
(v8.20.10#820010)