Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

Tim Armstrong Mon, 19 Feb 2018 13:50:16 -0800

We could drop NaNs and require that -0 be normalised to +0 when writing out
stats. That would remove any degrees of freedom from the writer and then
straightforward comparison with =, <, >, >=, <=, != would work as expected.


On Mon, Feb 19, 2018 at 8:04 AM, Zoltan Ivanfi <z...@cloudera.com> wrote:

> Hi,
>
> Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222
> <https://issues.apache.org/jira/browse/PARQUET-1222> as the preferred
> solution.
>
> Alex, not writing min/max if there is a NaN is indeed a feasible quick-fix,
> but I think it would be better to just ignore NaN-s for the pruposes of
> min/max stats. For reading, we can ignore stats that contain a NaN. We also
> shouldn't use stats when looking for a NaN. -0 and +0 will still be
> problematic, though.
>
> Jim, fmax is indeed very close to IEEE-754's maxNum, but -0 and +0 are
> implementation-dependent, az Zoltan Borok-Nagy pointed it out to me: "This
> function is not required to be sensitive to the sign of zero, although some
> implementations additionally enforce that if one argument is +0 and the
> other is -0, then +0 is returned." [1
> <http://en.cppreference.com/w/c/numeric/math/fmax>]
>
> Br,
>
> Zoltan
>
>
>
> On Fri, Feb 16, 2018 at 6:57 PM Jim Apple <jbap...@cloudera.com> wrote:
>
> > On Fri, Feb 16, 2018 at 9:44 AM, Zoltan Borok-Nagy
> > <borokna...@cloudera.com> wrote:
> > > I would just like to mention that the fmax() / fmin() functions in
> C/C++
> > > Math library follow the aforementioned IEEE 754-2008 min and max
> > > specification:
> > > http://en.cppreference.com/w/c/numeric/math/fmax
> > >
> > > I think this behavior is also the most intuitive and useful regarding
> to
> > > statistics. If we want to select the max value, I think it's reasonable
> > to
> > > ignore nulls and not-numbers.
> >
> > It should be noted that this is different than the total ordering
> > predicate. With that predicate, -NaN < -inf < negative numbers < -0.0
> > < +0.0 < positive numbers < +inf < +NaN
> >
> > fmax appears to be closest to IEEE-754's maxNum, but not quite
> > matching for some corner cases (-0.0, signalling NaN), but I'm not
> > 100% sure on that.
> >
>

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

Reply via email to