wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1330863760
########## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.util; + +import org.junit.Test; + +import static org.junit.Assert.assertEquals; +import static org.apache.parquet.util.Float16.*; Review Comment: Please do not use import * ########## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.util; + +import org.junit.Test; + +import static org.junit.Assert.assertEquals; +import static org.apache.parquet.util.Float16.*; + +public class TestFloat16 Review Comment: Should we include all tests from https://android.googlesource.com/platform/libcore/+/refs/heads/main/luni/src/test/java/libcore/libcore/util/FP16Test.java ########## parquet-column/src/main/java/org/apache/parquet/schema/Types.java: ########## @@ -566,6 +571,13 @@ private Optional<Boolean> checkBinaryPrimitiveType(LogicalTypeAnnotation logical return Optional.of(true); } + private Optional<Boolean> checkFloat16BinaryPrimitiveType(int l, LogicalTypeAnnotation logicalTypeAnnotation) { + Preconditions.checkState( + primitiveType == PrimitiveTypeName.BINARY && length == l, Review Comment: Float16 annotates FIXED_LENGTH_BYTE_ARRAY type not BINARY. This method should be removed. ########## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveComparator.java: ########## @@ -276,4 +279,24 @@ public String toString() { return "BINARY_AS_SIGNED_INTEGER_COMPARATOR"; } }; + + /** + * This comparator is for comparing two float16 values represented in 2 bytes binary. + */ + static final PrimitiveComparator<Binary> BINARY_AS_FLOAT16_COMPARATOR = new BinaryComparator() { Review Comment: +1 ########## parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java: ########## @@ -990,6 +990,30 @@ private void testUseStatsWithSignedSortOrder(StatsHelper helper) { } } + @Test + public void testFloat16Stats() { + BinaryStatistics bStats = new BinaryStatistics(); Review Comment: +1 ########## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ########## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; Review Comment: It would be reasonable to move it to `org.apache.parquet.type` under `parquet-common`. ########## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveComparator.java: ########## @@ -22,8 +22,11 @@ import java.io.Serializable; import java.nio.ByteBuffer; +import java.nio.ByteOrder; import java.util.Comparator; +import static org.apache.parquet.util.Float16.toFloat; Review Comment: Please avoid static import here ########## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ########## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * <a href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format">IEEE 754</a> + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * <ul> + * <li>Sign bit: 1 bit</li> + * <li>Exponent width: 5 bits</li> + * <li>Significand: 10 bits</li> + * </ul> + * + * <p>The format is laid out as follows:</p> + * <pre> + * 1 11111 1111111111 + * ^ --^-- -----^---- + * sign | |_______ significand + * | + * -- exponent + * </pre> + * Half-precision floating points can be useful to save memory and/or + * bandwidth at the expense of range and precision when compared to single-precision + * floating points (float32). + * Ref: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java Review Comment: It would be nice to include other methods from the reference code. At least it is good for debug purpose. ########## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ########## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * <a href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format">IEEE 754</a> + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * <ul> + * <li>Sign bit: 1 bit</li> + * <li>Exponent width: 5 bits</li> + * <li>Significand: 10 bits</li> + * </ul> + * + * <p>The format is laid out as follows:</p> + * <pre> + * 1 11111 1111111111 + * ^ --^-- -----^---- + * sign | |_______ significand + * | + * -- exponent + * </pre> + * Half-precision floating points can be useful to save memory and/or + * bandwidth at the expense of range and precision when compared to single-precision + * floating points (float32). + * Ref: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java + */ +public class Float16 Review Comment: We need to think about whether or not to expose `Float16` to external user. If this is only used internally when collecting column statistics, probably we should not make this class public. Users always read and write float16-typed data via `Binary` type. My preference is not to make it public. WDYT? @zhangjiashen @benibus ########## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java: ########## @@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset, int length, StringBuilder build } } }; + + static final PrimitiveStringifier FLOAT16_STRINGIFIER = new BinaryStringifierBase("FLOAT16_STRINGIFIER") { + + @Override + String stringifyNotNull(Binary value) { + if (value.length() != 2) { + return BINARY_INVALID; + } + ByteBuffer buffer = value.toByteBuffer().order(ByteOrder.LITTLE_ENDIAN); + return DEFAULT_STRINGIFIER.stringify(toFloat(buffer.getShort(buffer.position()))); Review Comment: Should we wrap this by adding a `Float16.toString()`? ########## parquet-column/src/main/java/org/apache/parquet/schema/Types.java: ########## @@ -465,6 +465,11 @@ public Optional<Boolean> visit(LogicalTypeAnnotation.UUIDLogicalTypeAnnotation u return checkFixedPrimitiveType(LogicalTypeAnnotation.UUIDLogicalTypeAnnotation.BYTES, uuidLogicalType); } + @Override + public Optional<Boolean> visit(LogicalTypeAnnotation.Float16LogicalTypeAnnotation float16LogicalType) { + return checkFloat16BinaryPrimitiveType(LogicalTypeAnnotation.Float16LogicalTypeAnnotation.BYTES, float16LogicalType); Review Comment: Calling `checkFixedPrimitiveType` is enough. ########## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java: ########## @@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset, int length, StringBuilder build } } }; + + static final PrimitiveStringifier FLOAT16_STRINGIFIER = new BinaryStringifierBase("FLOAT16_STRINGIFIER") { + + @Override + String stringifyNotNull(Binary value) { + if (value.length() != 2) { + return BINARY_INVALID; Review Comment: Should we throw instead of returning an invalid value? ########## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java: ########## @@ -24,6 +24,7 @@ import static java.util.concurrent.TimeUnit.MINUTES; import static java.util.concurrent.TimeUnit.NANOSECONDS; import static java.util.concurrent.TimeUnit.SECONDS; +import static org.apache.parquet.util.Float16.toFloat; Review Comment: IMO, we should only import `import org.apache.parquet.util.Float16` and explicitly call `Float16.toFloat` ########## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ########## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * <a href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format">IEEE 754</a> + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * <ul> + * <li>Sign bit: 1 bit</li> + * <li>Exponent width: 5 bits</li> + * <li>Significand: 10 bits</li> + * </ul> + * + * <p>The format is laid out as follows:</p> + * <pre> + * 1 11111 1111111111 + * ^ --^-- -----^---- + * sign | |_______ significand + * | + * -- exponent + * </pre> + * Half-precision floating points can be useful to save memory and/or + * bandwidth at the expense of range and precision when compared to single-precision + * floating points (float32). + * Ref: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java + */ +public class Float16 Review Comment: I am more concerned about the interoperability between different implementations. Perhaps the C++ impl can persist some parquet files with float16 type and make sure the parquet-mr reader can restore the exact values? ########## parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuildersWithLogicalTypes.java: ########## @@ -403,6 +414,18 @@ public void testUUIDLogicalType() { () -> Types.required(BINARY).as(uuidType()).named("uuid_field").toString()); } + @Test + public void testFloat16LogicalType() { + assertEquals( + "required binary float16_field (FLOAT16)", Review Comment: binary -> fixed_length_byte_array(2) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org