nsivabalan commented on code in PR #9581:
URL: https://github.com/apache/hudi/pull/9581#discussion_r1313612529


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDeleteBlock.java:
##########
@@ -65,17 +69,44 @@ public class HoodieDeleteBlock extends HoodieLogBlock {
   private static final Lazy<HoodieDeleteRecord.Builder> 
HOODIE_DELETE_RECORD_BUILDER_STUB =
       Lazy.lazily(HoodieDeleteRecord::newBuilder);
 
+  private final boolean writeRecordPositions;
+  // Records to delete, sorted based on the record position if writing record 
position to the log block header
   private DeleteRecord[] recordsToDelete;
 
-  public HoodieDeleteBlock(DeleteRecord[] recordsToDelete, 
Map<HeaderMetadataType, String> header) {
-    this(Option.empty(), null, false, Option.empty(), header, new HashMap<>());
-    this.recordsToDelete = recordsToDelete;
+  public HoodieDeleteBlock(List<Pair<DeleteRecord, Long>> recordsToDelete,
+                           boolean writeRecordPositions,
+                           Map<HeaderMetadataType, String> header) {
+    this(Option.empty(), null, false, Option.empty(), header, new HashMap<>(), 
writeRecordPositions);
+    if (writeRecordPositions) {
+      recordsToDelete.sort((o1, o2) -> {
+        long v1 = o1.getRight();
+        long v2 = o2.getRight();
+        return Long.compare(v1, v2);
+      });
+      if (recordsToDelete.get(0).getRight() > -1L) {
+        addRecordPositionsToHeader(

Review Comment:
   may I know under what circumstances, we will get -1 values here? 
   



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -70,14 +75,31 @@ public abstract class HoodieDataBlock extends 
HoodieLogBlock {
    * NOTE: This ctor is used on the write-path (ie when records ought to be 
written into the log)
    */
   public HoodieDataBlock(List<HoodieRecord> records,
+                         boolean writeRecordPositions,
                          Map<HeaderMetadataType, String> header,
                          Map<HeaderMetadataType, String> footer,
                          String keyFieldName) {
     super(header, footer, Option.empty(), Option.empty(), null, false);
+    if (writeRecordPositions) {
+      records.sort((o1, o2) -> {

Review Comment:
   where do we guard the below condition. 
   in case of async compaction, latest parquet file may not exists only. 
   So, a new log file being added to latest file slice should not encode the 
write positions.
   



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlock.java:
##########
@@ -137,6 +140,20 @@ public Roaring64NavigableMap getRecordPositions() throws 
IOException {
     return 
LogReaderUtils.decodeRecordPositionsHeader(logBlockHeader.get(HeaderMetadataType.RECORD_POSITIONS));
   }
 
+  protected void addRecordPositionsToHeader(Set<Long> positionSet, int 
numRecords) {
+    if (positionSet.size() == numRecords) {

Review Comment:
   I see. its here. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -70,14 +75,31 @@ public abstract class HoodieDataBlock extends 
HoodieLogBlock {
    * NOTE: This ctor is used on the write-path (ie when records ought to be 
written into the log)
    */
   public HoodieDataBlock(List<HoodieRecord> records,
+                         boolean writeRecordPositions,
                          Map<HeaderMetadataType, String> header,
                          Map<HeaderMetadataType, String> footer,
                          String keyFieldName) {
     super(header, footer, Option.empty(), Option.empty(), null, false);
+    if (writeRecordPositions) {
+      records.sort((o1, o2) -> {

Review Comment:
   where is the logic where we avoid writing record positions if there are 
duplicate records.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to