[
https://issues.apache.org/jira/browse/MATH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520897#comment-14520897
]
Thomas Neidhart edited comment on MATH-1220 at 4/30/15 5:37 AM:
----------------------------------------------------------------
{quote}
Furthermore, I would allow the exponent to be non-negative. Currently, it is
restricted to positive values.
{quote}
I am not sure if we should do this. Normally in literature and implementations,
the exponent is even further restricted (exp > 1), and if you look at the pmf
and cdf functions for exponents < 1 you can see that the resulting distribution
is quite distorted. I guess the typical implementation uses the Riemann zeta
function which does not converge for exponents < 1.
btw. there is something in your implementation that I do not understand:
{code}
* An instrumental distribution g(k) is used to generate random values by
rejection sampling.
* g(k) is defined as g(1):= 1 and g(k) := I(-s,k-1/2,k+1/2) for k larger
than 1,
* where s denotes the exponent of the Zipf distribution and I(r,a,b) is
the integral of x^r for x from a to b.
* Since 1^x^s is a convex function, Jensens's inequality gives
* I(-s,k-1/2,k+1/2) >= 1/k^s for all positive k and non-negative s.
* In order to limit the rejection rate for large exponents s,
* the instrumental distribution weight is differently defined for value 1.
*/
@Override
public int sample() {
if (Double.isNaN(instrumentalDistributionTailWeight)) {
instrumentalDistributionTailWeight =
integratePowerFunction(-exponent, 1.5, numberOfElements+0.5);
}
{code}
talks about integrating the power function in the range [k-0.5, k+0.5] but in
fact uses 1.5 and k+0.5 as bounds. Where does the 1.5 come from?
was (Author: tn):
{comment}
Furthermore, I would allow the exponent to be non-negative. Currently, it is
restricted to positive values.
{comment}
I am not sure if we should do this. Normally in literature and implementations,
the exponent is even further restricted (exp > 1), and if you look at the pmf
and cdf functions for exponents < 1 you can see that the resulting distribution
is quite distorted. I guess the typical implementation uses the Riemann zeta
function which does not converge for exponents < 1.
btw. there is something in your implementation that I do not understand:
{code}
* An instrumental distribution g(k) is used to generate random values by
rejection sampling.
* g(k) is defined as g(1):= 1 and g(k) := I(-s,k-1/2,k+1/2) for k larger
than 1,
* where s denotes the exponent of the Zipf distribution and I(r,a,b) is
the integral of x^r for x from a to b.
* Since 1^x^s is a convex function, Jensens's inequality gives
* I(-s,k-1/2,k+1/2) >= 1/k^s for all positive k and non-negative s.
* In order to limit the rejection rate for large exponents s,
* the instrumental distribution weight is differently defined for value 1.
*/
@Override
public int sample() {
if (Double.isNaN(instrumentalDistributionTailWeight)) {
instrumentalDistributionTailWeight =
integratePowerFunction(-exponent, 1.5, numberOfElements+0.5);
}
{code}
talks about integrating the power function in the range [k-0.5, k+0.5] but in
fact uses 1.5 and k+0.5 as bounds. Where does the 1.5 come from?
> More efficient sample() method for ZipfDistribution
> ---------------------------------------------------
>
> Key: MATH-1220
> URL: https://issues.apache.org/jira/browse/MATH-1220
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Otmar Ertl
> Attachments: patch_v1
>
>
> Currently, sampling from a ZipfDistribution is very inefficient. Random
> values are generated by inverting the CDF. However, the current
> implementation uses O(N) power function evaluations to calculate the CDF for
> some point. (Here N is the number of points of the Zipf distribution.) I
> propose to use rejection sampling instead, which allows the generation of a
> single random value in constant time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)