prashantwason commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1247188452


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option<GenericRecord> 
recordOpt) {
             
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
             
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
       }
+    } else {
+      this.isDeletedRecord = true;

Review Comment:
   >> I would favor isDeleted field in HoodieRecordIndexInfo in the schema.
   With  this design, for all deletes:
   1. We first have to read the existing record from MDT
   2. Add an Upsert to the log files
   3. Remove the deleted record at time of compaction
   
   The above has performance implication for larger indexes like RI.
   
   With the above design, DELETE from RI are treated exactly as DELETE from 
dataset - we write a DELETE block to the log file and existing MOR code takes 
care of it. This is simple.
   
   The reason we could not use this design for MDT is because there is no 
usecase in MDT where we actually DELETE an entire record. Example:
   1. If a file is deleted (during clean), we need to modify the 
partition_file_list_record to remove that single file
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to