Hi Thomas, I'm not too sure about your interpretation. Consider: set.seed(54321) X <- rnorm(10) ; Y <- rnorm(10) XY <- c(X,Y); group<-c(rep(0,10),rep(1,10))
t.test(X,Y,paired=TRUE) # Paired t-test # data: X and Y # t = -1.5265, df = 9, p-value = 0.1612 # 95 percent confidence interval: # -2.1521562 0.4178885 # sample estimates: mean of the differences # -0.8671339 t.test(XY~group,paired=TRUE) # Paired t-test # data: XY by group # t = -1.5265, df = 9, p-value = 0.1612 # 95 percent confidence interval: # -2.1521562 0.4178885 # sample estimates: mean of the differences # -0.8671339 t.test(X,Y,paired=FALSE) # Welch Two Sample t-test # data: X and Y # t = -1.4865, df = 17.956, p-value = 0.1545 # 95 percent confidence interval: # -2.0928836 0.3586159 # sample estimates: mean of x mean of y # -0.2646042 0.6025297 t.test(XY~group,paired=FALSE) # Welch Two Sample t-test # data: XY by group # t = -1.4865, df = 17.956, p-value = 0.1545 # 95 percent confidence interval: # -2.0928836 0.3586159 # sample estimates: mean in group 0 mean in group 1 # -0.2646042 0.6025297 So in each case t.test(XY~group,...) has done the same as the corresponding t.test(X,Y,...); thus it would not seem that "The formula interface is only applicable for the 2-sample tests" excludes specifying the two samples by a formula with a 2-level factor; and it would seem that the pairing has been correctly carried out (i.e. in the formula case t.test assumes that it is by sequential position within the respective group, without having to be told as you suggest). In other words, it acts as though giving a formula with a 2-group factor on the right, and equal numbers in the two groups, is equivalent to giving the two samples separately. Johannes' original query was about differences when there are NAs, corresponding to different settings of "na.action". It is perhaps possible that 'na.action="na.pass"' and 'na.action="na.exclude"' result in different pairings in the case "paired=TRUE". However, it seems to me that the differences he observed are, shall we say, obscure! Ted. On 13-Aug-10 22:31:45, Thomas Lumley wrote: > Thanks for the clear example. However, if there is a bug it is > only that t.test.formula doesn't throw an error when given the > paired=TRUE option. > > The documentation says "The formula interface is only applicable > for the 2-sample tests.", but there probably should be an explicit > check -- I didn't see that in the documentation until I realized > that t.test.formula couldn't work for paired tests because you don't > tell it which observations are paired. > -thomas > > On Fri, 13 Aug 2010, Johannes W. Dietrich wrote: >> Hello all, >> >> due to unexplained differences between statistical results from >> collaborators >> and our lab that arose in the same large proteomics dataset we >> reevaluated >> the t.test() function. Here, we found a weird behaviour that is also >> reproducible in the following small test dataset: >> >> Suppose, we have two vectors with numbers and some missing values that >> refer >> to the same individuals and that should therefore be evaluated with a >> paired >> t-test: >> >>> testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01); >>> testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA); >> >> Then >> >>> print(t.test(testdata.A, testdata.B, paired=TRUE, >>> alternative="two.sided", >>> na.action="na.pass")) >> >> and >> >>> print(t.test(testdata.A, testdata.B, paired=TRUE, >>> alternative="two.sided", >>> na.action="na.exclude")) >> >> deliver the same p value (0.1162, identical to Excel's result). >> >> However, after combining the two vectors with >> >>> testdata <- c(testdata.A, testdata.B); >> >> and defining a criterion vector with >> >>> criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1); >> >> (that is the type of data layout we have in our proteomics project) we >> get a >> different p-value (0.01453) with >> >>> print(t.test(testdata ~ criterion, paired=TRUE, >>> alternative="two.sided", >>> na.action="na.exclude")) . >> >> The statement >> >>> print(t.test(testdata ~ criterion, paired=TRUE, >>> alternative="two.sided", >>> na.action="na.pass")) >> >> however, delivers a p-value of 0.1162 again. >> >> With >> >>> print(t.test(testdata[criterion==0], testdata[criterion==1], >>> paired=TRUE, >>> alternative="two.sided", na.action="na.exclude")) >> >> that imitates the first form, we get again a p value of 0.1162. >> >> What is the reason for the different p values? Should not all calls to >> t.test >> that exlude missing values be equivalent and therefore deliver the >> same >> results? >> >> Excel, StatView and KaleidaGraph all display a p-value of 0.1162. >> >> J. W. D. >> -- >> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- >> -- -- >> -- Dr. Johannes W. Dietrich, M.D. >> -- Laboratory XU44, Endocrine Research >> -- Medical Hospital I, Bergmannsheil University Hospitals >> -- Ruhr University of Bochum >> -- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany >> -- Phone: +49:234:302-6400, Fax: +49:234:302-6403 >> -- eMail: "j.w.dietr...@medical-cybernetics.de" >> -- WWW: http://medical-cybernetics.de >> -- WWW: http://www.bergmannsheil.de >> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- >> -- -- >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Thomas Lumley > Professor of Biostatistics > University of Washington, Seattle > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 14-Aug-10 Time: 00:49:57 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.