Re: [Numpy-discussion] Pull request review #3770: Trapezoidal distribution

2013-09-23 Thread Jeremy Hetzel
On Sun, Sep 22, 2013 at 9:47 AM, Mark Szepieniec mszep...@gmail.com wrote:


 On Sun, Sep 22, 2013 at 1:24 PM, josef.p...@gmail.com wrote:


  I don't see a reason that numpy.random shouldn't get new
 distributions. It would also be useful to add the corresponding
 distribution to scipy.stats.


I have the pdf, cdf, and inverse cdf for the generalized trapezoidal. I've
looked through the other distributions at scipy.stats and adding this one
should not be difficult. I'll work on it next.



 naming: n, m would indicate to me that they are integers, but it they
 can be floats (0)
 alpha, beta ?


The three additional parameters for growth rate, decay rate, and boundary
ratio are floats  0. I renamed them from `m`, `n`, and `alpha` (which is
how they're parameterized in the published probability density function) to
simply `growth`, `decay`, and `ratio`.  Does that fit into the NumPy style?
It feels intuitive to me.




 Is there a standard version, e.g. left=0, right=1, mode1=?, ... ?

 In scipy.stats.distribution we are required to use a location, scale
 parameterization, where loc shifts the distribution and scale
 stretches it.
 Is there a standard parameterization for that?, for example
 left = loc = 0 (default) or left = loc / scale = 0
 right = scale = 1 (default)
 mode1_relative = mode1 / scale
 mode2_relative = mode2 / scale
 n, m unchanged no defaults

 just checked:
 your naming corresponds to triangular, and triang in scipy has the
 corresponding loc-scale parameterization.


Thanks. There is no standard version of the distribution that I'm aware of,
but for the purposes of scipy.stats, left=0, right=1 and mode1, mode2 being
either 0.25, 0.75 or 1/3, 2/3, seem reasonable. I'll give more thought to
the location and scale and send an email to scipy-dev if I need guidance.
Looking at scipy.stats.triang, my initial thought is:
left_relative = loc
mode1_relative = loc + mode1*scale
mode2_relative = loc + mode2*scale
right_relative = loc + scale
growth, decay, and ratio are unchanged.



 I think you need to s/first/second in the description of the mode2
 parameter?


Thanks for catching that. Fixed in a recent commit. mode2 should be the
second peak of the distribution.


Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Pull request review #3770: Trapezoidal distribution

2013-09-21 Thread Jeremy Hetzel
I've added a trapezoidal distribution to numpy.random for consideration,
pull request 3770:
https://github.com/numpy/numpy/pull/3770

Similar to the triangular distribution, the trapezoidal distribution may be
used where the underlying distribution is not known, but some knowledge of
the limits and mode exists. The trapezoidal distribution generalizes the
triangular distribution by allowing the modal values to be expressed as a
range instead of a point estimate.

The trapezoidal distribution implemented, known as the generalized
trapezoidal distribution, has three additional parameters: growth, decay,
and boundary ratio. Adjusting these from the default values create
trapezoidal-like distributions with non-linear behavior. Examples can be
seen in an R vignette (
http://cran.r-project.org/web/packages/trapezoid/vignettes/trapezoid.pdf ),
as well as these papers by J.R. van Dorp and colleagues:

1) van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
distributions. Metrika. 58(1):85–97. Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf

2) van Dorp, J. R., Rambaud, S.C., Perez, J. G., and Pleguezuelo, R. H.
(2007) An elicitation procedure for the generalized trapezoidal
distribution with a uniform central stage. Decision Analysis Journal.
4:156–166. Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf

The docstring for the proposed numpy.random.trapezoidal() is as follows:


trapezoidal(left, mode1, mode2, right, size=None, m=2, n=2, alpha=1)

Draw samples from the generalized trapezoidal distribution.

The trapezoidal distribution is defined by minimum (``left``),
lower mode (``mode1``), upper
mode (``mode1``), and maximum (``right``) parameters. The
generalized trapezoidal distribution
adds three more parameters: the growth rate (``m``), decay rate
(``n``), and boundary
ratio (``alpha``) parameters. The generalized trapezoidal
distribution simplifies
to the trapezoidal distribution when ``m = n = 2`` and ``alpha =
1``. It further
simplifies to a triangular distribution when ``mode1 == mode2``.

Parameters
--
left : scalar
Lower limit.
mode1 : scalar
The value where the first peak of the distribution occurs.
The value should fulfill the condition ``left = mode1 =
mode2``.
mode2 : scalar
The value where the first peak of the distribution occurs.
The value should fulfill the condition ``mode1 = mode2 =
right``.
right : scalar
Upper limit, should be larger than or equal to `mode2`.
size : int or tuple of ints, optional
Output shape. Default is None, in which case a single value is
returned.
m : scalar, optional
Growth parameter.
n : scalar, optional
Decay parameter.
alpha : scalar, optional
Boundary ratio parameter.

Returns
---
samples : ndarray or scalar
The returned samples all lie in the interval [left, right].

Notes
-
With ``left``, ``mode1``, ``mode2``, ``right``, ``m``, ``n``, and
``alpha`` parametrized as
:math:`a, b, c, d, m, n, \\text{ and } \\alpha`, respectively,
the probability density function for the generalized trapezoidal
distribution is

.. math::
  f{\\scriptscriptstyle X}(x\mid\theta) =
\\mathcal{C}(\\Theta) \\times
  \\begin{cases}
  \\alpha \\left(\\frac{x - \\alpha}{b - \\alpha}
\\right)^{m - 1},  \\text{for } a \\leq x  b 
  (1 - \\alpha) \\left(\frac{x - b}{c - b} \\right)
+ \\alpha,  \\text{for } b \\leq x  c 
  \\left(\\frac{d - x}{d - c} \\right)^{n-1}, 
\\text{for } c \\leq x \\leq d
  \\end{cases}

with the normalizing constant :math:`\\mathcal{C}(\\Theta)` defined
as

..math::
\\mathcal{C}(\\Theta) =
\\frac{2mn}
{2 \\alpha \\left(b - a\\right) n +
\\left(\\alpha + 1 \\right) \\left(c - b \\right)mn
+
2 \\left(d - c \\right)m}

and where the parameter vector :math:`\\Theta = \\{a, b, c, d, m,
n, \\alpha \\}, \\text{ } a \\leq b \\leq c \\leq d, \\text{ and } m, n,
\\alpha 0`.

Similar to the triangular distribution, the trapezoidal
distribution may be used where the
underlying distribution is not known, but some knowledge of the
limits and
mode exists. The trapezoidal distribution generalizes the
triangular distribution by allowing
the modal values to be expressed as a range instead of a point
estimate. The growth, decay, and
boundary ratio parameters of the generalized trapezoidal
distribution further allow for