Re: [PR] [HUDI-9012] Implement and utilize native writer for HFile [hudi]

via GitHub Fri, 23 May 2025 13:02:43 -0700


yihua commented on code in PR #12866:
URL: https://github.com/apache/hudi/pull/12866#discussion_r2105312116



##########
hudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestHFileUtils.java:
##########
@@ -36,8 +37,8 @@
  */
 public class TestHFileUtils {
   @ParameterizedTest
-  @EnumSource(Compression.Algorithm.class)
-  public void testGetHFileCompressionAlgorithm(Compression.Algorithm algo) {
+  @EnumSource(CompressionCodec.class)
+  public void testGetHFileCompressionAlgorithm(CompressionCodec algo) {

Review Comment:
   Similarly, revisit all HFile-related classes that do not depend on HBase 
anymore.  They should be either in `hudi-io` or `hudi-common` module.



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/HoodieDecompressor.java:
##########
@@ -41,4 +41,10 @@ int decompress(InputStream compressedInput,
                  byte[] targetByteArray,
                  int offset,
                  int length) throws IOException;
+
+  /**
+   * Compress the data provided.
+   *
+   */

Review Comment:
   Update javadocs with param



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/CompressionCodec.java:
##########
@@ -41,4 +51,21 @@ public enum CompressionCodec {
   public String getName() {
     return name;
   }
+
+  public static CompressionCodec findCodecByName(String name) {
+    CompressionCodec codec =
+        NAME_TO_COMPRESSION_CODEC_MAP.get(name.toLowerCase());
+    ValidationUtils.checkArgument(
+        codec != null, String.format("Cannot find compression codec: %s", 
name));
+    return codec;
+  }
+
+  /**
+   * Create a mapping from its name to the compression codec.
+   */
+  private static Map<String, CompressionCodec> 
createNameToCompressionCodecMap() {
+    Map<String, CompressionCodec> result = new HashMap<>();
+    Arrays.stream(CompressionCodec.values()).forEach(codec -> 
result.put(codec.getName(), codec));

Review Comment:
   nit: this can leverage `.collect` to directly generate a `Map`



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/airlift/HoodieAirliftGzipDecompressor.java:
##########
@@ -50,4 +52,12 @@ public int decompress(InputStream compressedInput,
       return readFully(stream, targetByteArray, offset, length);
     }
   }
+
+  public byte[] compress(byte[] data) throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    try (GZIPOutputStream gzipOutputStream = new 
GZIPOutputStream(byteArrayOutputStream)) {
+      gzipOutputStream.write(data);
+    }
+    return byteArrayOutputStream.toByteArray();

Review Comment:
   Is there a need to return the compressed bytes?  Is it better to pass in the 
output stream and write directly to that?



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/airlift/HoodieAirliftGzipDecompressor.java:
##########
@@ -50,4 +52,12 @@ public int decompress(InputStream compressedInput,
       return readFully(stream, targetByteArray, offset, length);
     }
   }
+
+  public byte[] compress(byte[] data) throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    try (GZIPOutputStream gzipOutputStream = new 
GZIPOutputStream(byteArrayOutputStream)) {
+      gzipOutputStream.write(data);
+    }
+    return byteArrayOutputStream.toByteArray();
+  }

Review Comment:
   Could this use `gzipStreams.createOutputStream` for consistency?



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/HoodieDecompressor.java:
##########
@@ -41,4 +41,10 @@ int decompress(InputStream compressedInput,
                  byte[] targetByteArray,
                  int offset,
                  int length) throws IOException;
+
+  /**

Review Comment:
    Rename the interface (`HoodieCompression`) and update docs since this 
contains both `compress` and `decompress` now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9012] Implement and utilize native writer for HFile [hudi]

Reply via email to