Here's my problem: NaNs. Most real-world data that one would interrogate
is filled with them. The typical stats package has a global switch
named something like rm_NaNs; if rm_NaNs==0, then most functions (min, max,
variance, et cetera) will return NaN if any element of the input is NaN, and
if rm_NaNs==1, then these functions auto-prune, by prepending every use
of x with something like
if (!rm_NaNs || !gsl_isnan(x))
use x
So has the GSL team considered including such a flag in the GSL? As above,
fixing the code in most cases would be a trivial one-line insertion,
but are there other reasons for not adding a global gsl_rm_NaNs variable?
Hello,
Interesting subject. Generally we don't use the NaN as missing value
interpretation in GSL, because of the risk of confusing the two. In GSL
a NaN always indicates a numerical error and should always be propagated
so that it is not lost or hidden in someway.
I have looked at adding support for an "NA" (not available) value in the
past, as in R and Octave but decided against it ("NA" is a NaN with a
specific bit pattern in an empty part of the IEEE fields). The problems
with it were:
1) It is a non-standard usage. This creates some problems with
operations on NA and NaN values converting between the two, e.g. x+NA
could come out as NaN or NA.
2) It only works for floating point, we'd really want a uniform
interface for all types. For integers R uses MIN_INT as NA but that is
not really an option of GSL.
I think these limitations are less of a problem in an application like R
or Octave where all the data is under the control of the environment
itself, but not suitable for a general C library.
In terms of adding support for missing values in GSL I can see one way
that would fit with another missing feature -- namely, online updating
of statistics. If there were functions for online updating of means,
sds, etc from individual datapoints the user could control what values
were passed/discarded, at the cost of some function call overhead. The
alternative would be passing an additional user-defined selection
function argument, as in the n-tuples module, which could be used to
drop selected values (could also be useful for trimming tails etc).
--
Brian Gough
Network Theory Ltd,
Publishing the GSL Manual - http://www.network-theory.co.uk/gsl/manual/