yihua commented on code in PR #12866:
URL: https://github.com/apache/hudi/pull/12866#discussion_r2106013411


##########
hudi-io/src/main/java/org/apache/hudi/io/compress/HoodieCompressor.java:
##########
@@ -21,11 +21,12 @@
 
 import java.io.IOException;
 import java.io.InputStream;
+import java.nio.ByteBuffer;
 
 /**
- * Provides decompression on input data.
+ * Compression and decompress input data.

Review Comment:
   ```suggestion
    * Provides compression and decompression on input data.
   ```



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/airlift/HoodieAirliftGzipDecompressor.java:
##########
@@ -50,4 +52,12 @@ public int decompress(InputStream compressedInput,
       return readFully(stream, targetByteArray, offset, length);
     }
   }
+
+  public byte[] compress(byte[] data) throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    try (GZIPOutputStream gzipOutputStream = new 
GZIPOutputStream(byteArrayOutputStream)) {
+      gzipOutputStream.write(data);
+    }
+    return byteArrayOutputStream.toByteArray();

Review Comment:
   Sg.  It looks like we need to know the length of compressed bytes first 
while writing the HFile data block, so getting the compressed bytes first is 
unavoidable.



##########
hudi-io/src/main/java/org/apache/hudi/io/hfile/ChecksumType.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+/**
+ * Type of checksum used to validate the integrity of data block.
+ * It determines the number of bytes used for checksum.
+ */
+public enum ChecksumType {
+
+  NULL((byte) 0) {
+    @Override
+    public String getName() {
+      return "NULL";
+    }
+  },
+
+  CRC32((byte) 1) {
+    @Override
+    public String getName() {
+      return "CRC32";
+    }
+  },
+
+  CRC32C((byte) 2) {
+    @Override
+    public String getName() {
+      return "CRC32C";
+    }
+  };
+
+  private final byte code;
+
+  public static ChecksumType getDefaultChecksumType() {
+    return ChecksumType.CRC32C;
+  }
+
+  /** returns the name of this checksum type */
+  public abstract String getName();
+
+  private ChecksumType(final byte c) {
+    this.code = c;
+  }
+
+  public byte getCode() {
+    return this.code;
+  }
+
+  /**
+   * Use designated byte value to indicate checksum type.
+   * @return Type associated with passed code.
+   */
+  public static ChecksumType codeToType(final byte b) {
+    for (ChecksumType t : ChecksumType.values()) {
+      if (t.getCode() == b) {
+        return t;
+      }
+    }
+    throw new RuntimeException("Unknown checksum type code " + b);
+  }
+
+  /**
+   * Map a checksum name to a specific type. Do our own names.
+   * @return Type associated with
+   * passed code.

Review Comment:
   nit: update docs



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/HoodieCompressor.java:
##########
@@ -41,4 +42,22 @@ int decompress(InputStream compressedInput,
                  byte[] targetByteArray,
                  int offset,
                  int length) throws IOException;
+
+  /**
+   * Compress data stored in byte array.

Review Comment:
   ```suggestion
      * Compresses data stored in byte array.
   ```



##########
hudi-io/src/main/java/org/apache/hudi/io/compress/airlift/HoodieAirliftGzipCompressor.java:
##########
@@ -50,4 +53,20 @@ public int decompress(InputStream compressedInput,
       return readFully(stream, targetByteArray, offset, length);
     }
   }
+
+  @Override
+  public byte[] compress(byte[] data) throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    try (HadoopOutputStream gzipOutputStream = 
gzipStreams.createOutputStream(byteArrayOutputStream)) {
+      gzipOutputStream.write(data);
+    }
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  @Override
+  public ByteBuffer compress(ByteBuffer uncompressedBytes) throws IOException {
+    byte[] temp = new byte[uncompressedBytes.remaining()];
+    uncompressedBytes.get(temp);
+    return ByteBuffer.wrap(this.compress(temp));
+  }

Review Comment:
   Do we need to keep a second `compress` method?



##########
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileBlock.java:
##########
@@ -68,43 +77,38 @@ static class Header {
   }
 
   protected final HFileContext context;
-  protected final byte[] byteBuff;
-  protected final int startOffsetInBuff;
-  protected final int sizeCheckSum;
-  protected final int uncompressedEndOffset;
   private final HFileBlockType blockType;
-  protected final int onDiskSizeWithoutHeader;
-  protected final int uncompressedSizeWithoutHeader;
-  protected final int bytesPerChecksum;
-  private boolean isUnpacked = false;
-  protected byte[] compressedByteBuff;
-  protected int startOffsetInCompressedBuff;
 
+  protected Option<HFileBlockReadAttributes> readAttributesOpt;
+  protected Option<HFileBlockWriteAttributes> writeAttributesOpt;

Review Comment:
   I fixed it.  The other constructor should assign `Option.empty()` instead of 
the variable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to