Re: [Statistics] Convention when outside support?

Alex Herbert Fri, 29 Nov 2019 09:42:09 -0800

On 29/11/2019 16:48, Gilles Sadowski wrote:

Hello.


For all implemented distributions, what convention should be adopted
when methods
  * density(x)
  * logDensity(x)
  * cumulativeProbability(x)
are called with "x" out of the "support" bounds?

Currently some (but not all[1]) are documented to return "NaN".
An alternative could be to throw an exception.

The convention in the java.lang.Math class is to return NaN for thingsthat do not make sense, e.g.


Math.log(-1)
Math.asin(4)

This leaves it as the responsibility of the caller to know when it maybe possible to pass in a bad value and so check the results.

It unfortunately leaves open the issue that not everyone will do thatand so their program can be brought to a stop by presence of NaN valuesthat may have appeared some way further back in the computation.

Throwing an exception seems to be the only way to preserve the stacktrace of where the computation went wrong.


So either case has merit.

What do other languages do? A few seem to return 0 for out of support.

I had a look at Python. Here there is not much consistency using scipy:

>>> import math
>>> from scipy.stats import gamma
>>> gamma.pdf(0.5, 1.99)
0.3066586069413397
>>> gamma.pdf(-0.5, 1.99)
0.0
>>> gamma.logpdf(-0.5, 1.99)
-inf
>>> math.log(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: math domain error

So scipy returns 0 for the density function when outside support. Itreturns -inf for the log of zero but python's math function returns anexception for the log of zero.

In R the behaviour is the same as python with the exception that the logof zero is -Inf.


> dgamma(0, 2)
[1] 0
> dgamma(-1, 2)
[1] 0
> dgamma(-1, 2, log=TRUE)
[1] -Inf
> log(0)
[1] -Inf

So returning 0 is another option. However this cannot distinguish avalid return of 0 from an error.

Note that if we did not have double as a return value then throwing anexception would be the primary choice for signalling error as there isno NaN for other numbers. However there are documented cases forcomputations in the JDK which do not make sense that avoid throwingexceptions as in Math.abs(int) for Integer.MIN_VALUE which still returnsa negative.

I'm not a fan of static properties to configure the behaviour eitherway. I don't think using zero is a good idea as it cannot signalsomething is wrong.


I would favour one of the following:

- Provide alternative methods to return NaN or throw

- Always return NaN (which seems more Java conventional) and provide awrapper distribution that can wrap calls to density, logDensity andcumulativeProbability and throw an exception if the underlyingdistribution returns NaN.- Always throw (which forces users to safe usage) and provide a wrapperdistribution that can wrap calls to density, logDensity andcumulativeProbability and return NaN or zero if the underlyingdistribution throws.

When considering the situation where you can create a distribution witha bad value and you get an exception, but you can use a distributionwith a bad value and you get NaN it seems to me that throwing anexception may be the more sensible approach. A wrapper to guardexceptions can be user configurable to return NaN or zero.


Alex

Regards,
Gilles

[1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1503

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] Convention when outside support?

Reply via email to