[
https://issues.apache.org/jira/browse/MATH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526002#comment-14526002
]
Otmar Ertl commented on MATH-1220:
----------------------------------
Yes, my proposed method outperforms the rejection inversion method for large
exponents. However, I suppose large exponents (greater than 10) are not very
interesting for data modelling. The exponent is usually close to 1 or even less
than 1. See for example following paper about word frequencies
http://colala.bcs.rochester.edu/papers/piantadosi2014zipfs.pdf. The original
rejection inversion method cannot handle exponents in that range. The second
patch contains an adapted rejection inversion algorithm, that covers all
non-negative exponents. (I still favor to include 0 as allowed value for the
exponent. It is easier for data modelling, if you are allowed to choose the
parameter from an closed interval. Furthermore, exponent equal to 0 represents
a meaningful distribution, since it corresponds to a uniform distribution.)
> More efficient sample() method for ZipfDistribution
> ---------------------------------------------------
>
> Key: MATH-1220
> URL: https://issues.apache.org/jira/browse/MATH-1220
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Otmar Ertl
> Fix For: 4.0, 3.6
>
> Attachments: patch_v1, patch_v2
>
>
> Currently, sampling from a ZipfDistribution is very inefficient. Random
> values are generated by inverting the CDF. However, the current
> implementation uses O(N) power function evaluations to calculate the CDF for
> some point. (Here N is the number of points of the Zipf distribution.) I
> propose to use rejection sampling instead, which allows the generation of a
> single random value in constant time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)