Recently someone opened ARROW-2145
for support for non-finite values, such as NaN and infinity.
It may seem like a “no-brainer” to implement this, but there’s no real
consistency on how to implement it or *even to implement it at all*:
- Java BigDecimal: raises an exception for nan or inf as per the docs
- boost multiprecision supports it but not for fixed precision decimal
numbers (cpp_bin_float/cpp_dec_float, which are arbitrary precision
floating point not fixed point)
- python supports it using flags and special string exponents (and it
supports both signaling and quiet nans)
- impala doesn’t support it (returns null when you try to perform
AS DOUBLE) AS DECIMAL)
- postgres supports it with its numeric
by using the sign member of the C struct backing numeric values
- MySQL: doesn’t even support nan/inf!
The lack of support for these values across languages likely stems from the
fact that fixed precision arithmetic by definition must happen on finite
values, and nan/inf are not finite values therefore they are not supported.
We could go down this rabbit hole in the name of providing support for
Python decimal.Decimal(<non-finite value>) but I’m not sure how useful it
No other system except in-memory C++ arrow arrays would be able to operate
on these values (I suppose we could add a wrapper around BigDecimal that
has the desired behavior).
For example, writing arrow arrays containing Decimal128 values (with nans
or infs) to a parquet file seems untenable.
Additionally, if we decided to implement it, we’d likely have to take
something like the flag approach which would require a change to the
metadata (not necessary a bad thing) that would add two bitmaps to arrow
Decimal arrays: one for indicating nan-ness and one for indicating inf-ness
(that’s a ton of overhead IMO when I think it’s likely that most values are
I’m skeptical about whether we should support this.