[ 
https://issues.apache.org/jira/browse/HUDI-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899563#comment-17899563
 ] 

Y Ethan Guo commented on HUDI-8521:
-----------------------------------

h2. How merger is used in HoodieAvroRecordMerger and log record scanner
It's used in #processNextRecord. "records" is a spillable map storing merged 
records from log files. If we assume the log file does not have duplicate keys, 
which is the case (writer dedups the input batch), the merge call here operates 
on records across batches. So "preCombine" merger or logic should not be 
invoked here.

{code:java}
@Override
  protected <T> void processNextRecord(HoodieRecord<T> newRecord) throws 
IOException {
    String key = newRecord.getRecordKey();
    HoodieRecord<T> prevRecord = records.get(key);
    if (prevRecord != null) {
      // Merge and store the combined record
      HoodieRecord<T> combinedRecord = (HoodieRecord<T>) 
recordMerger.merge(prevRecord, readerSchema,
          newRecord, readerSchema, this.getPayloadProps()).get().getLeft();
      // If pre-combine returns existing record, no need to update it
      if (combinedRecord.getData() != prevRecord.getData()) {
        HoodieRecord latestHoodieRecord = getLatestHoodieRecord(newRecord, 
combinedRecord, key);

        // NOTE: Record have to be cloned here to make sure if it holds 
low-level engine-specific
        //       payload pointing into a shared, mutable (underlying) buffer we 
get a clean copy of
        //       it since these records will be put into records(Map).
        records.put(key, latestHoodieRecord.copy());
      }
    } else {
      // Put the record as is
      // NOTE: Record have to be cloned here to make sure if it holds low-level 
engine-specific
      //       payload pointing into a shared, mutable (underlying) buffer we 
get a clean copy of
      //       it since these records will be put into records(Map).
      records.put(key, newRecord.copy());
    }
  } {code}

> Resolve issues w/ diff merge modes 
> -----------------------------------
>
>                 Key: HUDI-8521
>                 URL: https://issues.apache.org/jira/browse/HUDI-8521
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: reader-core, writer-core
>            Reporter: sivabalan narayanan
>            Assignee: Lin Liu
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> we found some issues w/ merge mode feature in 1.x. 
> we need to triage them and fix 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to