[
https://issues.apache.org/jira/browse/STATISTICS-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220361#comment-17220361
]
Andreas Stefik commented on STATISTICS-25:
------------------------------------------
The easiest way I've found to do this is to grab an online python interpreter,
like the one here:
[https://www.tutorialspoint.com/execute_python_online.php]
Then I dumped in this script:
{code:java}
import scipy.stats
print scipy.stats.t.cdf(0.025, 1)
print scipy.stats.t.cdf(0.025, 10)
print scipy.stats.t.cdf(0.025, 1e2)
print scipy.stats.t.cdf(0.025, 1e3)
print scipy.stats.t.cdf(0.025, 1e5)
print scipy.stats.t.cdf(0.025, 1e10)
print scipy.stats.t.cdf(0.025, 1e20)
print scipy.stats.t.cdf(0.025, 1e40)
{code}
In this case, the output is as follows:
0.507956089912
0.509726595102
0.509947608093
0.509970024339
0.509972493254
0.509972518195
0.509972518195
0.509972518195
If I take a similar function in R:
{code:java}
> options(digits=17)
> pt(0.025, 1)
[1] 0.50795608991202579
> pt(0.025, 10)
[1] 0.50972659510159002
> pt(0.025, 1e2)
[1] 0.50994760809308248
> pt(0.025, 1e3)
[1] 0.50997002433945715
> pt(0.025, 1e5)
[1] 0.50997249325358851
> pt(0.025, 1e10)
[1] 0.50997251819498857
> pt(0.025, 1e20)
[1] 0.50997251819523803
> pt(0.025, 1e40)
[1] 0.50997251819523803
{code}
Then in Apache Commons:
{code:java}
System.out.println("" + new TDistribution(1).cumulativeProbability(0.025));
System.out.println("" + new TDistribution(10).cumulativeProbability(0.025));
System.out.println("" + new TDistribution(1e2).cumulativeProbability(0.025));
System.out.println("" + new TDistribution(1e3).cumulativeProbability(0.025));
System.out.println("" + new TDistribution(1e5).cumulativeProbability(0.025));
System.out.println("" + new TDistribution(1e10).cumulativeProbability(0.025));
System.out.println("" + new TDistribution(1e20).cumulativeProbability(0.025));
System.out.println("" + new
TDistribution(1e40).cumulativeProbability(0.025));{code}
{code:java}
0.5079560899120266
0.509726595101589
0.509947608093117
0.5099700243396535
0.5099724932544486
0.5099729613741282
1.0
1.0{code}
Looks pretty close until the 10 - 20 range. I only noticed the problem because
I'm doing some Tukey computations, but the most modern algorithms require df of
infinity for certain parts of those calculations.
Hope that helps and thanks again for helping track this down!
> T Distribution Inverse Cumulative Probability Function gives the Wrong Answer
> -----------------------------------------------------------------------------
>
> Key: STATISTICS-25
> URL: https://issues.apache.org/jira/browse/STATISTICS-25
> Project: Apache Commons Statistics
> Issue Type: Bug
> Reporter: Andreas Stefik
> Priority: Major
>
> Hi There,
> Given code like this:
>
> import org.apache.commons.math3.analysis.UnivariateFunction;
> import org.apache.commons.math3.analysis.solvers.BrentSolver;
> import org.apache.commons.math3.distribution.TDistribution;
> public class Main {
> public static void main(String[] args) {
> double df = 1E38;
> double t = 0.975;
> TDistribution dist = new TDistribution(df);
>
> double prob = dist.inverseCumulativeProbability(1.0 - t);
>
> System.out.println("Prob: " + prob);
> }
> }
>
> It is possible I am misunderstanding, but that seems equivalent to:
>
> scipy.stats.t.cdf(1.0 - 0.975, 1e38)
>
> In Python. They give different answers. Python gives 0.509972518193, which
> seems correct, whereas Apache Commons gives Prob: -6.462184036284304E-10.
> That's a huge difference.
> My hunch is that as you get closer to infinity it begins to fail, but I
> haven't checked carefully. For calls with much smaller degrees of freedom,
> you get answers that are basically the same as Python or online calculators.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)