[jira] [Updated] (STATISTICS-59) Correct Pareto distribution sampling with extreme shape parameter

Alex Herbert (Jira) Thu, 24 Nov 2022 08:07:08 -0800


     [ 
https://issues.apache.org/jira/browse/STATISTICS-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Herbert updated STATISTICS-59:
-----------------------------------
    Description: 
The Pareto distribution has CDF:
{noformat}
             ( scale )^shape
CDF(x) = 1 - ( ----- )
             (   x   ){noformat}
This is inverted using high precision Math functions to support very small p 
values:
{noformat}
x = scale / exp(log(1 - p) / shape)
  = scale / Math.exp(Math.log1p(-p) / shape);{noformat}
This is sampled using inverse transform sampling as:
{noformat}
x = scale / (1 - p)^(1 / shape)
  = scale / Math.pow(1 - p, 1 / shape){noformat}
This is fast as it requires a single call to Math.pow. It must only handle 
p-values down to 2^-53 as sampling generates p as one of the 2^53 dyadic 
rationals in [0, 1).

However it has some issues when the shape parameter is extreme: either shape is 
infinite or 1 / shape is infinite.

Here is a table of the inverse CDF and the sample value for scale = 1 and an 
extreme shape. p has been set using the most extreme values from the dyadic 
rationals (0, 2^-53, 1 - 2^-53, 1):
||Shape||p||icdf(p)||sample||
|Infinity|0.0|1.0|1.0|
|Infinity|1.1102230246251565E-16|1.0|1.0|
|Infinity|0.9999999999999999|1.0|1.0|
|Infinity|1.0|Infinity|1.0|
|1.0E300|0.0|1.0|1.0|
|1.0E300|1.1102230246251565E-16|1.0|1.0|
|1.0E300|0.9999999999999999|1.0|1.0|
|1.0E300|1.0|Infinity|Infinity|
|4.9E-324|0.0|1.0|NaN|
|4.9E-324|1.1102230246251565E-16|Infinity|Infinity|
|4.9E-324|0.9999999999999999|Infinity|Infinity|
|4.9E-324|1.0|Infinity|Infinity|

When 1 / shape is infinite the NaN occurs when Math.pow(1, Infinity) == NaN. In 
this case sampling inversion is an error.

When shape is infinite the mismatch occurs when Math.pow(0, 0) == 1 and the 
shape is returned rather than the distribution upper bound. This is because the 
inverse CDF detects this edge case when the input p=1. In this case pure 
inversion of the CDF is creating an outlier and the sampling inversion is more 
suitable. However when the shape is large and finite then the sampling 
inversion also creates an infinite sample which is an outlier compared to all 
other samples which are the scale.

The sampling should be updated to avoid the possibility of NaN generation and 
ensure samples are returned without outliers from the main region of the CDF.

 

  was:
The Pareto distribution has CDF:
{noformat}
             ( scale )^shape
CDF(x) = 1 - ( ----- )
             (   x   ){noformat}
This is inverted using high precision Math functions to support very small p 
values:
{noformat}
x = scale / exp(log(1 - p) / shape)
  = scale / Math.exp(Math.log1p(-p) / shape);{noformat}
This is sampled using inverse transform sampling as:
{noformat}
x = scale / (1 - p)^(1 / shape)
  = scale / Math.pow(1 - p, 1 / shape){noformat}
This is fast as it requires a single call to Math.pow. It must only handle 
p-values down to 2^-53 as sampling generates p as one of the 2^53 dyadic 
rationals in [0, 1).

However it has some issues when the shape parameter is extreme: either shape is 
infinite or 1 / shape is infinite.

Here is a table of the inverse CDF and the sample value for scale = 1 and an 
extreme shape. p has been set using the most extreme values from the dyadic 
rationals (0, 2^-53, 1 - 2^-53, 1):
||Shape||p||icdf(p)||sample||
|Infinity|0.0|1.0|1.0|
|Infinity|1.1102230246251565E-16|1.0|1.0|
|Infinity|0.9999999999999999|1.0|1.0|
|Infinity|1.0|Infinity|1.0|
|4.9E-324|0.0|1.0|NaN|
|4.9E-324|1.1102230246251565E-16|Infinity|Infinity|
|4.9E-324|0.9999999999999999|Infinity|Infinity|
|4.9E-324|1.0|Infinity|Infinity|

When 1 / shape is infinite the NaN occurs when Math.pow(1, Infinity) == NaN. In 
this case sampling inversion is an error.

When shape is infinite the mismatch occurs when Math.pow(0, 0) == 1 and the 
shape is returned rather than the distribution upper bound. This is because the 
inverse CDF detects this edge case when the input p=1. In this case pure 
inversion of the CDF is creating an outlier and the sampling inversion is more 
suitable.

The sampling should be updated to avoid the possibility of NaN generation and 
ensure samples are returned without outliers from the main region of the CDF.

 


> Correct Pareto distribution sampling with extreme shape parameter
> -----------------------------------------------------------------
>
>                 Key: STATISTICS-59
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-59
>             Project: Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: pareto.png
>
>
> The Pareto distribution has CDF:
> {noformat}
>              ( scale )^shape
> CDF(x) = 1 - ( ----- )
>              (   x   ){noformat}
> This is inverted using high precision Math functions to support very small p 
> values:
> {noformat}
> x = scale / exp(log(1 - p) / shape)
>   = scale / Math.exp(Math.log1p(-p) / shape);{noformat}
> This is sampled using inverse transform sampling as:
> {noformat}
> x = scale / (1 - p)^(1 / shape)
>   = scale / Math.pow(1 - p, 1 / shape){noformat}
> This is fast as it requires a single call to Math.pow. It must only handle 
> p-values down to 2^-53 as sampling generates p as one of the 2^53 dyadic 
> rationals in [0, 1).
> However it has some issues when the shape parameter is extreme: either shape 
> is infinite or 1 / shape is infinite.
> Here is a table of the inverse CDF and the sample value for scale = 1 and an 
> extreme shape. p has been set using the most extreme values from the dyadic 
> rationals (0, 2^-53, 1 - 2^-53, 1):
> ||Shape||p||icdf(p)||sample||
> |Infinity|0.0|1.0|1.0|
> |Infinity|1.1102230246251565E-16|1.0|1.0|
> |Infinity|0.9999999999999999|1.0|1.0|
> |Infinity|1.0|Infinity|1.0|
> |1.0E300|0.0|1.0|1.0|
> |1.0E300|1.1102230246251565E-16|1.0|1.0|
> |1.0E300|0.9999999999999999|1.0|1.0|
> |1.0E300|1.0|Infinity|Infinity|
> |4.9E-324|0.0|1.0|NaN|
> |4.9E-324|1.1102230246251565E-16|Infinity|Infinity|
> |4.9E-324|0.9999999999999999|Infinity|Infinity|
> |4.9E-324|1.0|Infinity|Infinity|
> When 1 / shape is infinite the NaN occurs when Math.pow(1, Infinity) == NaN. 
> In this case sampling inversion is an error.
> When shape is infinite the mismatch occurs when Math.pow(0, 0) == 1 and the 
> shape is returned rather than the distribution upper bound. This is because 
> the inverse CDF detects this edge case when the input p=1. In this case pure 
> inversion of the CDF is creating an outlier and the sampling inversion is 
> more suitable. However when the shape is large and finite then the sampling 
> inversion also creates an infinite sample which is an outlier compared to all 
> other samples which are the scale.
> The sampling should be updated to avoid the possibility of NaN generation and 
> ensure samples are returned without outliers from the main region of the CDF.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (STATISTICS-59) Correct Pareto distribution sampling with extreme shape parameter

Reply via email to