nsivabalan commented on code in PR #13952:
URL: https://github.com/apache/hudi/pull/13952#discussion_r2366247859
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/StreamingFileGroupRecordBufferLoader.java:
##########
@@ -86,21 +84,8 @@ public Pair<HoodieFileGroupRecordBuffer<T>, List<String>>
getRecordBuffer(Hoodie
deleteContext.withReaderSchema(recordSchema);
while (recordIterator.hasNext()) {
HoodieRecord<T> hoodieRecord = recordIterator.next();
- T data = recordContext.extractDataFromRecord(hoodieRecord, recordSchema,
props);
try {
- // we use -U operation to represent the record should be ignored
during updating index.
- HoodieOperation hoodieOperation = hoodieRecord.getIgnoreIndexUpdate()
? HoodieOperation.UPDATE_BEFORE : hoodieRecord.getOperation();
- BufferedRecord<T> bufferedRecord;
- if (data == null) {
- DeleteRecord deleteRecord =
DeleteRecord.create(hoodieRecord.getKey(),
hoodieRecord.getOrderingValue(recordSchema, props, orderingFieldsArray));
- bufferedRecord = BufferedRecords.fromDeleteRecord(deleteRecord,
recordContext, hoodieOperation);
- } else {
- // HoodieRecord#isDelete does not check if a record is a DELETE
marked by a custom delete marker,
- // so we use recordContext#isDeleteRecord here if the data field is
not null.
- boolean isDelete = recordContext.isDeleteRecord(data, deleteContext);
- bufferedRecord = BufferedRecords.fromEngineRecord(data,
hoodieRecord.getRecordKey(), recordSchema, recordContext, orderingFieldNames,
- BufferedRecords.inferOperation(isDelete, hoodieOperation));
- }
+ BufferedRecord<T> bufferedRecord =
BufferedRecords.fromHoodieRecord(hoodieRecord, recordSchema, recordContext,
props, orderingFieldsArray);
Review Comment:
looks like ordering value in `HoodieRecord` is transient. we might have to
fix that.
essentially, whatever info that BufferedRecord needs to instantiate, we
should be looking to get them while creating the record only, and post that, we
should never be polling the `data` to fetch any of the known information. w/
this, for commit time ordering and event time ordering we never have to look
into the `data` only forever.
that would leave us only w/ custom payload and custom merger cases that
would need to look into `data` even within the task for merge purposes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]