Implementation of Single Sample T-Test using Map Reduce/Mahout
--------------------------------------------------------------

                 Key: MAHOUT-1000
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1000
             Project: Mahout
          Issue Type: New Feature
          Components: Math
    Affects Versions: Backlog
         Environment: Linux, Mac OS, Hadoop 0.20.2, Mahout 0.x
            Reporter: Dev Lakhani
             Fix For: Backlog


Implement a map/reduce version of the single sample t test to test whether a 
sample of n subjects comes from a population in which the mean equals a 
particular value.

For a large dataset, say n millions of rows, one can test whether the sample 
(large as it is) comes from the population mean.

Input:
1) specified population mean to be tested against
2) hypothesis direction : i.e. "two.sided", "less", "greater".
3) confidence level or alpha
4) flag to indicate paired or not paired

The procedure is as follows:
1. Use Map/Reduce to calculate the mean of the sample.
2. Use Map/Reduce to calculate standard error of the population mean.
3. Use Map/Reduce to calculate the t statistic
4. Estimate the degrees of freedom depending on equal sample variances 

Output
1) The value of the t-statistic.
2) The p-value for the test.
3) Flag that is true if the null hypothesis can be rejected with confidence 1 - 
alpha; false otherwise.

References
http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to