[ 
https://issues.apache.org/jira/browse/STATISTICS-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert updated STATISTICS-59:
-----------------------------------
    Attachment: pareto.png

> Correct Pareto distribution sampling with extreme shape parameter
> -----------------------------------------------------------------
>
>                 Key: STATISTICS-59
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-59
>             Project: Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: pareto.png
>
>
> The Pareto distribution has CDF:
> {noformat}
>              ( scale )^shape
> CDF(x) = 1 - ( ----- )
>              (   x   ){noformat}
> This is inverted using high precision Math functions to support very small p 
> values:
> {noformat}
> x = scale / exp(log(1 - p) / shape)
>   = scale / Math.exp(Math.log1p(-p) / shape);{noformat}
> This is sampled using inverse transform sampling as:
> {noformat}
> x = scale / (1 - p)^(1 / shape)
>   = scale / Math.pow(1 - p, 1 / shape){noformat}
> This is fast as it requires a single call to Math.pow. It must only handle 
> p-values down to 2^-53 as sampling generates p as one of the 2^53 dyadic 
> rationals in [0, 1).
> However it has some issues when the shape parameter is extreme: either shape 
> is infinite or 1 / shape is infinite.
> Here is a table of the inverse CDF and the sample value for scale = 1 and an 
> extreme shape. p has been set using the most extreme values from the dyadic 
> rationals (0, 2^-53, 1 - 2^-53, 1):
> ||Shape||p||icdf(p)||sample||
> |Infinity|0.0|1.0|1.0|
> |Infinity|1.1102230246251565E-16|1.0|1.0|
> |Infinity|0.9999999999999999|1.0|1.0|
> |Infinity|1.0|Infinity|1.0|
> |4.9E-324|0.0|1.0|NaN|
> |4.9E-324|1.1102230246251565E-16|Infinity|Infinity|
> |4.9E-324|0.9999999999999999|Infinity|Infinity|
> |4.9E-324|1.0|Infinity|Infinity|
> When 1 / shape is infinite the NaN occurs when Math.pow(1, Infinity) == NaN. 
> In this case sampling inversion is an error.
> When shape is infinite the mismatch occurs when Math.pow(0, 0) == 1 and the 
> shape is returned rather than the distribution upper bound. This is because 
> the inverse CDF detects this edge case when the input p=1. In this case pure 
> inversion of the CDF is creating an outlier and the sampling inversion is 
> more suitable.
> The sampling should be updated to avoid the possibility of NaN generation and 
> ensure samples are returned without outliers from the main region of the CDF.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to