On 29/11/2019 16:48, Gilles Sadowski wrote:
Hello.

For all implemented distributions, what convention should be adopted
when methods
  * density(x)
  * logDensity(x)
  * cumulativeProbability(x)
are called with "x" out of the "support" bounds?

Currently some (but not all[1]) are documented to return "NaN".
An alternative could be to throw an exception.

The convention in the java.lang.Math class is to return NaN for things that do not make sense, e.g.

Math.log(-1)
Math.asin(4)

This leaves it as the responsibility of the caller to know when it may be possible to pass in a bad value and so check the results.

It unfortunately leaves open the issue that not everyone will do that and so their program can be brought to a stop by presence of NaN values that may have appeared some way further back in the computation.

Throwing an exception seems to be the only way to preserve the stack trace of where the computation went wrong.

So either case has merit.

What do other languages do? A few seem to return 0 for out of support.

I had a look at Python. Here there is not much consistency using scipy:

>>> import math
>>> from scipy.stats import gamma
>>> gamma.pdf(0.5, 1.99)
0.3066586069413397
>>> gamma.pdf(-0.5, 1.99)
0.0
>>> gamma.logpdf(-0.5, 1.99)
-inf
>>> math.log(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: math domain error

So scipy returns 0 for the density function when outside support. It returns -inf for the log of zero but python's math function returns an exception for the log of zero.

In R the behaviour is the same as python with the exception that the log of zero is -Inf.

> dgamma(0, 2)
[1] 0
> dgamma(-1, 2)
[1] 0
> dgamma(-1, 2, log=TRUE)
[1] -Inf
> log(0)
[1] -Inf

So returning 0 is another option. However this cannot distinguish a valid return of 0 from an error.

Note that if we did not have double as a return value then throwing an exception would be the primary choice for signalling error as there is no NaN for other numbers. However there are documented cases for computations in the JDK which do not make sense that avoid throwing exceptions as in Math.abs(int) for Integer.MIN_VALUE which still returns a negative.

I'm not a fan of static properties to configure the behaviour either way. I don't think using zero is a good idea as it cannot signal something is wrong.

I would favour one of the following:

- Provide alternative methods to return NaN or throw
- Always return NaN (which seems more Java conventional) and provide a wrapper distribution that can wrap calls to density, logDensity and cumulativeProbability and throw an exception if the underlying distribution returns NaN. - Always throw (which forces users to safe usage) and provide a wrapper distribution that can wrap calls to density, logDensity and cumulativeProbability and return NaN or zero if the underlying distribution throws.

When considering the situation where you can create a distribution with a bad value and you get an exception, but you can use a distribution with a bad value and you get NaN it seems to me that throwing an exception may be the more sensible approach. A wrapper to guard exceptions can be user configurable to return NaN or zero.

Alex
Regards,
Gilles

[1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1503

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to