Re: the handiness of undef becoming NaN (when you want that)

Glenn Linderman Mon, 22 Oct 2001 10:47:14 -0700

Aaron Sherman wrote:

> I see your point, but going from: "you have to error-check to be
> sure that the average you get is valid" to "you get NaN and like it"
> is a bit steep.


"you get NaN and like it" only happens when you put garbage in... and get garbage
out.

Yes, NaN is garbage.  But when it doesn't happen, you would have reasonable
confidence that the input is not garbage even _without_ writing the error checking
code.

So I see the tradeoff as being the following:

"Automatic NaN conversions for garbage in" allows me to avoid writing the error
checking code for cases where good data is coming in, yet have confidence when I get
non-NaN output that the data was good (to the extent that numbers appeared where
expected in strings).

vs.

"Automatic conversion of garbage to zero" allows me to get good results for cases
where good data is coming in, but for garbage in I get what you call "meaningful
results" without any clue as to whether the meaningful results resulted from .005%
garbage, or 99.995% garbage, or any other % garbage.

So laziness and automatic NaN conversion allows me to write the simple solution, and
if all my assumptions about the data are correct, I get good results.  But if they
aren't, I get NaN, and know I need to write a more careful program, with more input
validity checking.

Laziness and zero conversion allows me to produce "meaningful results" with no clue
as to just how much meaning they actually have.

Unless, of course, I happen to divide by one of those garbage ins, and get a
"meaningful" divide by zero error.  How do you produce "meaningful results" in that
case?

> For "Dallas" you get 8, and for "Dr. Who" you get NaN!

For the present technique of conversion to zero, is the result for "Dallas" or "Dr.
Who" a better result?  Is a garbage 0 or a garbage 8 more "meaningful" to you?

> Is more error checking good? Yes. Is screwing the user who doesn't
> error check the right answer? Not in my Perl programming experience....

We can agree more error checking is good.

It is not clear to me that producing "meaningful results" from garbage in should be
considered "screwing the user".  I'd call it alerting the user that he has some
garbage in, and he needs to enhance his code (or data source) to deal with it to
produce what I would call meaningful results -- some statistics on % data valid or
some such thing.

> > Yes, it takes a few extra lines to code, but adds a significant amount of
> > surety to the usefulness of the results.
>
> But converting to NaN does *not*.

I suspect you are reacting here to the probably misreading of my statement above.
Note, however, that I did not say that converting to NaN adds usefulness to the
results.  I said it adds "surety" to the usefulness of the results.  When you get
NaN, you are very, very sure that something went wrong, as NaN is seldom, if ever,
useful as a result.

So converting to NaN does add a significant amount of _surety_ to the usefulness of
the results.  You know then, that the results are not useful, because you had garbage
in.

> Great, sounds-like pragma territory. I could live with a:
>
>         use string_to_nan;
>
> I have a hard time living with:
>
>         use string_to_zero;
>
> and a NaN default.

If I had to choose among those, I'd personally prefer to live with the second.
However, there are other options, some of which have been mentioned in related
threads.

The option I like is to have Perl 6 provide a selection of different string methods
for extracting the numeric values, for a variety of types of numeric values.  For
example, one method might accept only integers, another method might allow complex
numbers, another method might allow metric suffixes for scaling, another method might
allow floating point numbers, another might allow decimal numbers without exponents,
another might allow full numeric expression evaluation (numeric constants and
operators only).  With such an option, you could choose the type of conversion based
on the type of data you expect.  I could see each of those conversion functions
taking an optional parameter defining what to return if the conversion fails...
typical values for the parameter might be undef, 0, or NaN... and I'd recommend that
omitting the parameter would cause the conversion to return undef.... easily
detectable by code that cares, converts to zero for some cases of meaningful results
(when they really might be) and compatible with most cases of not supplying a
parameter to a method.

--
Glenn
=====
Due to the current economic situation, the light at the
end of the tunnel will be turned off until further notice.

Re: the handiness of undef becoming NaN (when you want that)

Reply via email to