[ 
https://issues.apache.org/jira/browse/MATH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526002#comment-14526002
 ] 

Otmar Ertl commented on MATH-1220:
----------------------------------

Yes, my proposed method outperforms the rejection inversion method for large 
exponents. However, I suppose large exponents (greater than 10) are not very 
interesting for data modelling. The exponent is usually close to 1 or even less 
than 1. See for example following paper about word frequencies 
http://colala.bcs.rochester.edu/papers/piantadosi2014zipfs.pdf. The original 
rejection inversion method cannot handle exponents in that range. The second 
patch contains an adapted rejection inversion algorithm, that covers all 
non-negative exponents. (I still favor to include 0 as allowed value for the 
exponent. It is easier for data modelling, if you are allowed to choose the 
parameter from an closed interval. Furthermore, exponent equal to 0 represents 
a meaningful distribution, since it corresponds to a uniform distribution.)

> More efficient sample() method for ZipfDistribution
> ---------------------------------------------------
>
>                 Key: MATH-1220
>                 URL: https://issues.apache.org/jira/browse/MATH-1220
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Otmar Ertl
>             Fix For: 4.0, 3.6
>
>         Attachments: patch_v1, patch_v2
>
>
> Currently, sampling from a ZipfDistribution is very inefficient. Random 
> values are generated by inverting the CDF. However, the current 
> implementation uses O(N) power function evaluations to calculate the CDF for 
> some point. (Here N is the number of points of the Zipf distribution.) I 
> propose to use rejection sampling instead, which allows the generation of a 
> single random value in constant time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to