yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1456552529


##########
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileUtils.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hudi.io.compress.CompressionCodec;
+import org.apache.hudi.io.util.IOUtils;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Util methods for reading and writing HFile.
+ */
+public class HFileUtils {
+  private static final Map<Integer, CompressionCodec> 
HFILE_COMPRESSION_CODEC_MAP = createCompressionCodecMap();
+
+  /**
+   * Gets the compression codec based on the ID.  This ID is written to the 
HFile on storage.
+   *
+   * @param id ID indicating the compression codec.
+   * @return compression codec based on the ID.
+   */
+  public static CompressionCodec decodeCompressionCodec(int id) {
+    CompressionCodec codec = HFILE_COMPRESSION_CODEC_MAP.get(id);
+    if (codec == null) {
+      throw new IllegalArgumentException("Compression code not found for ID: " 
+ id);
+    }
+    return codec;
+  }
+
+  /**
+   * Reads the HFile major version from the input.
+   *
+   * @param bytes  input data.
+   * @param offset offset to start reading.
+   * @return major version of the file.
+   */
+  public static int readMajorVersion(byte[] bytes, int offset) {
+    int ch1 = bytes[offset] & 0xFF;
+    int ch2 = bytes[offset + 1] & 0xFF;
+    int ch3 = bytes[offset + 2] & 0xFF;
+    return ((ch1 << 16) + (ch2 << 8) + ch3);
+  }
+
+  /**
+   * Compares two HFile {@link Key}.
+   *
+   * @param key1 left operand key.
+   * @param key2 right operand key.
+   * @return 0 if equal, < 0 if left is less than right, > 0 otherwise.
+   */
+  public static int compareKeys(Key key1, Key key2) {
+    return IOUtils.compareTo(
+        key1.getBytes(), key1.getContentOffset(), key1.getContentLength(),
+        key2.getBytes(), key2.getContentOffset(), key2.getContentLength());
+  }
+
+  /**
+   * The ID mapping cannot change or else that breaks all existing HFiles out 
there,
+   * even the ones that are not compressed! (They use the NONE algorithm)
+   * This is because HFile stores the ID to indicate which compression codec 
is used.
+   *
+   * @return the mapping of ID to compression codec.
+   */
+  private static Map<Integer, CompressionCodec> createCompressionCodecMap() {

Review Comment:
   We actually allow users of Hudi to tweak HFile compression algorithm through 
[`hoodie.hfile.compression.algorithm`](https://hudi.apache.org/docs/configurations#hoodiehfilecompressionalgorithm),
 so any possible compression codec may be used.  I'll keep the list here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to