Re: Sorting floating-point values, and NaN

Andrei Alexandrescu Tue, 12 Nov 2013 12:06:02 -0800

On 11/12/13 9:59 AM, Vladimir Panteleev wrote:

On Tuesday, 12 November 2013 at 16:48:01 UTC, Andrei Alexandrescu wrote:

On 11/12/13 8:00 AM, Vladimir Panteleev wrote:

1) NaN values are accepted as input almost anywhere where doubles are.
std.conv recognizes the string "nan" and converts it to double.nan,


They are still singular values. Code has no business comparing them
unwittingly.

2) It is unreasonable to expect every D user out there dealing with
floating point numbers to be forced to check every source of floating
point values in their program against NaNs.


I find that perfectly reasonable. How do you mean that?


Both of the above hinge on the relative obscurity of NaNs and the
problems they could cause. People who are not specifically familiar with
NaNs and how they interact with other floating-point values will treat
floating-point values as "just numbers", and expect them to compare and
sort in the same way as integers.


I think that's their problem, not ours.

The language allows NaN floating point number with the semantics of
IEEE 754. We hardly have any leeway in the matter unless we want to
irate a lot of people for no good reason.

It's a judgment call, not a cut and dried decision.


Agreed - however, I think there might be more than one way to deal with
this.

1) As above, introduce a "less" function which guarantees transitivity
for basic types, and use it in examples throughout.


This is a bit much.

2) Improve documentation and visibility of this problem. For example,
add this to the documentation of sort:

"The predicate must be transitive (`a<b && b<c` implies `a<c`) in order
for `sort` to behave properly - otherwise, the program may fail on
certain inputs (but not others) when not compiled in release mode. Note
that `a<b` is not transitive for floating points, because the expression
will always be `false` when either `a` or `b` is `NaN`."


Great idea, just file an enh request or just write the pull request.

We should simply add an assert to isSorted.


How would this be implemented, anyway? I stumbled upon this problem with
this code:

enum sortPred = q{a.hits > b.hits || (a.hits == b.hits && a.value <
b.value)};
all.sort!sortPred();

(I know about multiSort, but it currently seems to be buggy - I do have
a test case, but it has a few million entries and I need to reduce it.)

if (a[i] < a[i + 1]) { assert(!(a[i + 1] < a[i])); ... }
else { assert(a[i + 1] >= a[i]); ... }


Sort et al don't know how to check "a>=b"... and they can't use "a<b"
either, because !(a<b) && !(b<a) implies a==b.

It really implies "a and b are in the same equivalence class". Buildingon that notion, we can figure what's wrong with NaNs and how to assertagainst their failings.


Consider we define equiv(a, b) as (a, b) => !less(a, b) && !less(b, a).

If at least one of the numbers is NaN, all comparisons return false.That puts NaN in the same equivalence class with all numbers, includingnumbers that are deemed in distinct equivalence classes. That's whereNaNs mess things up, and that's what we need to assert. Consider threenumbers a, b, and c. Then the following must pass:


assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

("<=" on Booleans is actually implication.)

That test will fail with NaNs and should be part of isSorted, sort etc.

We should also check such as:

assert(!less(a, a));
assert(less(a, b) <= !less(b, a));

These should be checked in sort only; isSorted does not need these.


Andrei

Re: Sorting floating-point values, and NaN

Reply via email to