nsivabalan commented on code in PR #13952:
URL: https://github.com/apache/hudi/pull/13952#discussion_r2366247859


##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/StreamingFileGroupRecordBufferLoader.java:
##########
@@ -86,21 +84,8 @@ public Pair<HoodieFileGroupRecordBuffer<T>, List<String>> 
getRecordBuffer(Hoodie
     deleteContext.withReaderSchema(recordSchema);
     while (recordIterator.hasNext()) {
       HoodieRecord<T> hoodieRecord = recordIterator.next();
-      T data = recordContext.extractDataFromRecord(hoodieRecord, recordSchema, 
props);
       try {
-        // we use -U operation to represent the record should be ignored 
during updating index.
-        HoodieOperation hoodieOperation = hoodieRecord.getIgnoreIndexUpdate() 
? HoodieOperation.UPDATE_BEFORE : hoodieRecord.getOperation();
-        BufferedRecord<T> bufferedRecord;
-        if (data == null) {
-          DeleteRecord deleteRecord = 
DeleteRecord.create(hoodieRecord.getKey(), 
hoodieRecord.getOrderingValue(recordSchema, props, orderingFieldsArray));
-          bufferedRecord = BufferedRecords.fromDeleteRecord(deleteRecord, 
recordContext, hoodieOperation);
-        } else {
-          // HoodieRecord#isDelete does not check if a record is a DELETE 
marked by a custom delete marker,
-          // so we use recordContext#isDeleteRecord here if the data field is 
not null.
-          boolean isDelete = recordContext.isDeleteRecord(data, deleteContext);
-          bufferedRecord = BufferedRecords.fromEngineRecord(data, 
hoodieRecord.getRecordKey(), recordSchema, recordContext, orderingFieldNames,
-              BufferedRecords.inferOperation(isDelete, hoodieOperation));
-        }
+        BufferedRecord<T> bufferedRecord = 
BufferedRecords.fromHoodieRecord(hoodieRecord, recordSchema, recordContext, 
props, orderingFieldsArray);

Review Comment:
   looks like ordering value in `HoodieRecord` is transient. we might have to 
fix that. 
   essentially, whatever info that BufferedRecord needs to instantiate, we 
should be looking to get them while creating the record only, and post that, we 
should never be polling the `data` to fetch any of the known information. w/ 
this, for commit time ordering and event time ordering we never have to look 
into the `data` only forever. 
   
   that would leave us only w/ custom payload and custom merger cases that 
would need to look into `data` even within the task for merge purposes. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to