Hi everyone, I would like to propose adding a half-precision floating point type to the Parquet format specification, in accordance with the active proposal here:
- https://github.com/apache/parquet-format/pull/184 To summarize, the current proposal would introduce a Float16 logical type, represented by a little-endian 2-byte FixedLenByteArray. The value's encoding would adhere to the IEEE-754 standard [1]. Furthermore, implementations should ensure that any value comparisons and ordering requirements (mainly for column statistics) emulate the behavior of native (i.e. physical) floating point types. As for how this would look in practice, there are currently several implementations of this proposal that are more or less complete: - C++ (and Python): https://github.com/apache/arrow/pull/36073 - Java: https://github.com/apache/parquet-mr/pull/1142 - Go: https://github.com/apache/arrow/pull/37599 Of course, we're prepared to make adjustments to the implementations as needed, since the format additions will need to be approved before those PRs are merged. I should also note that naming conventions haven't been extensively discussed, so feel free to chime in if you have a strong preference for HALF or HALF_FLOAT over FLOAT16! This vote will be open for at least 72 hours. [ ] +1 Add this type to the format specification [ ] +0 [ ] -1 Do not add this type to the format specification because... Thanks! Ben [1]: https://en.wikipedia.org/wiki/Half-precision_floating-point_format -- *Ben Harkins* *Software Engineer | Voltron Data* *Email*: [email protected]
