Re: [PR] PARQUET-1647: [Java][Parquet] Implement FLOAT16 logical type [parquet-mr]

via GitHub Wed, 18 Oct 2023 00:22:56 -0700


gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1363371525



##########
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##########
@@ -0,0 +1,307 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.util.Arrays;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * <a 
href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754</a>
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * <ul>
+ * <li>Sign bit: 1 bit</li>
+ * <li>Exponent width: 5 bits</li>
+ * <li>Significand: 10 bits</li>
+ * </ul>
+ *
+ * <p>The format is laid out as follows:</p>
+ * <pre>
+ * 1   11111   1111111111
+ * ^   --^--   -----^----
+ * sign  |          |_______ significand
+ *       |
+ *      -- exponent
+ * </pre>
+ * Half-precision floating points can be useful to save memory and/or
+ * bandwidth at the expense of range and precision when compared to 
single-precision
+ * floating points (float32).
+ * Ref: 
https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java
+ */
+public class Float16 {
+  // Smallest negative value a half-precision float may have.
+  public static final short LOWEST_VALUE = (short) 0xfbff;
+  // Maximum positive finite value a half-precision float may have.
+  public static final short MAX_VALUE = (short) 0x7bff;
+  // Smallest positive normal value a half-precision float may have.
+  public static final short MIN_NORMAL = (short) 0x0400;
+  // Smallest positive non-zero value a half-precision float may have.
+  public static final short MIN_VALUE = (short) 0x0001;
+  // Positive 0 of type half-precision float.
+  public static final short POSITIVE_ZERO = (short) 0x0000;
+  // Negative 0 of type half-precision float.
+  public static final short NEGATIVE_ZERO = (short) 0x8000;
+  // Byte array in little endian for positive 0 of type half-precision float.
+  public static final byte[] POSITIVE_ZERO_BYTES_LITTLE_ENDIAN = 
Float16.toBytesLittleEndian(Float16.POSITIVE_ZERO);
+  // Byte array in little endian for negative 0 of type half-precision float.
+  public static final byte[] NEGATIVE_ZERO_BYTES_LITTLE_ENDIAN = 
Float16.toBytesLittleEndian(Float16.NEGATIVE_ZERO);
+  // A Not-a-Number representation of a half-precision float.
+  public static final short NaN = (short) 0x7e00;
+  // Positive infinity of type half-precision float.
+  static final short POSITIVE_INFINITY = (short) 0x7c00;
+  // Negative infinity of type half-precision float.
+  static final short NEGATIVE_INFINITY = (short) 0xfc00;
+
+  // The bitmask to and a number with to obtain the sign bit.
+  private static final int SIGN_MASK                 = 0x8000;
+  // The offset to shift by to obtain the exponent bits.
+  private static final int EXPONENT_SHIFT            = 10;
+  // The bitmask to and a number shifted by EXPONENT_SHIFT right, to obtain 
exponent bits.
+  private static final int SHIFTED_EXPONENT_MASK     = 0x1f;
+  // The bitmask to and a number with to obtain significand bits.
+  private static final int SIGNIFICAND_MASK          = 0x3ff;
+  // The offset of the exponent from the actual value.
+  private static final int EXPONENT_BIAS             = 15;
+  // The offset to shift by to obtain the sign bit.
+  private static final int SIGN_SHIFT                = 15;
+  // The bitmask to AND with to obtain exponent and significand bits.
+  private static final int EXPONENT_SIGNIFICAND_MASK = 0x7fff;
+
+  private static final int FP32_SIGN_SHIFT            = 31;
+  private static final int FP32_EXPONENT_SHIFT        = 23;
+  private static final int FP32_SHIFTED_EXPONENT_MASK = 0xff;
+  private static final int FP32_SIGNIFICAND_MASK      = 0x7fffff;
+  private static final int FP32_EXPONENT_BIAS         = 127;
+  private static final int FP32_QNAN_MASK             = 0x400000;
+  private static final int FP32_DENORMAL_MAGIC = 126 << 23;
+  private static final float FP32_DENORMAL_FLOAT = 
Float.intBitsToFloat(FP32_DENORMAL_MAGIC);
+
+  /**
+   * Converts the specified half-precision float value into a
+   * single-precision float value. The following special cases are handled:
+   * If the input is NaN, the returned value is Float NaN.
+   * If the input is POSITIVE_INFINITY or NEGATIVE_INFINITY, the returned 
value is respectively
+   *   Float POSITIVE_INFINITY or Float NEGATIVE_INFINITY.
+   * If the input is 0 (positive or negative), the returned value is +/-0.0f.
+   * Otherwise, the returned value is a normalized single-precision float 
value.
+   *
+   * @param h The half-precision float value to convert to single-precision
+   * @return A normalized single-precision float value
+   */
+  public static float toFloat(short h) {

Review Comment:
   The question is what do we want to use this class for. Inside parquet-mr we 
need this functionality to add support for `FLOAT16` into `PrimitiveComparator` 
and `PrimitiveStringifier`. For that we only need a utility class in the same 
module/package and don't need public methods at all.
   If we want to support our API users than we might need public methods that 
can read/write from/to 2-bytes long `Binary` values. But that would be the 
first logical type that we would add support for our users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] PARQUET-1647: [Java][Parquet] Implement FLOAT16 logical type [parquet-mr]

Reply via email to