Re: [PR] [HUDI-9012] Implement and utilize native writer for HFile [hudi]

via GitHub Fri, 23 May 2025 16:00:25 -0700


yihua commented on code in PR #12866:
URL: https://github.com/apache/hudi/pull/12866#discussion_r2105505548



##########
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileDataBlock.java:
##########
@@ -149,4 +155,74 @@ public boolean next(HFileCursor cursor, int 
blockStartOffsetInFile) {
   private boolean isAtFirstKey(int relativeOffset) {
     return relativeOffset == HFILEBLOCK_HEADER_SIZE;
   }
+
+  // ================ Below are for Write ================
+  protected final List<KeyValueEntry> entries = new ArrayList<>();
+
+  public HFileDataBlock(HFileContext context) {
+    this(context,-1L);
+  }
+
+  public HFileDataBlock(HFileContext context, long previousBlockOffset) {
+    super(context, HFileBlockType.DATA, previousBlockOffset);
+    // This is not used for write.
+    uncompressedContentEndRelativeOffset = -1;
+  }
+
+  public List<KeyValueEntry> getEntries() {
+    return entries;
+  }
+
+  public boolean isEmpty() {
+    return entries.isEmpty();
+  }
+
+  public void add(byte[] key, byte[] value) {
+    KeyValueEntry kv = new KeyValueEntry(key, value);
+    // Assume all entries are sorted before write.
+    add(kv, false);
+  }
+
+  public int getNumOfEntries() {
+    return entries.size();
+  }
+
+  protected void add(KeyValueEntry kv, boolean sorted) {
+    entries.add(kv);
+    if (sorted) {
+      entries.sort(KeyValueEntry::compareTo);
+    }

Review Comment:
   It's better to avoid such sort and enforce ordering from the caller (we can 
add an internal check to make sure the new key added is lexicographically 
incrementing; throw an exception if not).  Also, could the writer keep writing 
to a byte buffer instead of keeping a list of `KeyValueEntry`?



##########
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileDataBlock.java:
##########
@@ -149,4 +155,74 @@ public boolean next(HFileCursor cursor, int 
blockStartOffsetInFile) {
   private boolean isAtFirstKey(int relativeOffset) {
     return relativeOffset == HFILEBLOCK_HEADER_SIZE;
   }
+
+  // ================ Below are for Write ================
+  protected final List<KeyValueEntry> entries = new ArrayList<>();
+
+  public HFileDataBlock(HFileContext context) {
+    this(context,-1L);
+  }
+
+  public HFileDataBlock(HFileContext context, long previousBlockOffset) {
+    super(context, HFileBlockType.DATA, previousBlockOffset);
+    // This is not used for write.
+    uncompressedContentEndRelativeOffset = -1;
+  }
+
+  public List<KeyValueEntry> getEntries() {
+    return entries;
+  }
+
+  public boolean isEmpty() {
+    return entries.isEmpty();
+  }
+
+  public void add(byte[] key, byte[] value) {
+    KeyValueEntry kv = new KeyValueEntry(key, value);
+    // Assume all entries are sorted before write.
+    add(kv, false);
+  }
+
+  public int getNumOfEntries() {
+    return entries.size();
+  }
+
+  protected void add(KeyValueEntry kv, boolean sorted) {
+    entries.add(kv);
+    if (sorted) {
+      entries.sort(KeyValueEntry::compareTo);
+    }
+  }
+
+  public byte[] getFirstKey() {
+    return entries.get(0).key;
+  }
+
+  public byte[] getLastKeyContent() {
+    if (entries.isEmpty()) {
+      return new byte[0];
+    }
+    return entries.get(entries.size() - 1).key;
+  }
+
+  @Override
+  public ByteBuffer getPayload() {
+    ByteBuffer dataBuf = ByteBuffer.allocate(context.getBlockSize() * 2);

Review Comment:
   Why `context.getBlockSize() * 2`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9012] Implement and utilize native writer for HFile [hudi]

Reply via email to