Hi everyone,

I would like to propose adding a half-precision floating point type to
the Parquet format specification, in accordance with the active
proposal here:


   - https://github.com/apache/parquet-format/pull/184

To summarize, the current proposal would introduce a Float16 logical
type, represented by a little-endian 2-byte FixedLenByteArray. The
value's encoding would adhere to the IEEE-754 standard [1].
Furthermore, implementations should ensure that any value comparisons
and ordering requirements (mainly for column statistics) emulate the
behavior of native (i.e. physical) floating point types.

As for how this would look in practice, there are currently several
implementations of this proposal that are more or less complete:


   - C++ (and Python): https://github.com/apache/arrow/pull/36073
   - Java: https://github.com/apache/parquet-mr/pull/1142
   - Go: https://github.com/apache/arrow/pull/37599

Of course, we're prepared to make adjustments to the implementations as
needed, since the format additions will need to be approved before those
PRs are merged. I should also note that naming conventions haven't been
extensively discussed, so feel free to chime in if you have a strong
preference for HALF or HALF_FLOAT over FLOAT16!


This vote will be open for at least 72 hours.

[ ] +1 Add this type to the format specification
[ ] +0
[ ] -1 Do not add this type to the format specification because...

Thanks!

Ben

[1]: https://en.wikipedia.org/wiki/Half-precision_floating-point_format


-- 
*Ben Harkins*
*Software Engineer | Voltron Data*
*Email*: [email protected]

Reply via email to