Re: [R] Difference Between R: wilcox.test and STATA: signrank
Hi, Look at the output of the test made in R and you can see it is a Wilcoxon rank sum test and not a Wilcoxon signed rank test. If there are ties, I know I prefer wilcox.exact from the exactRankTests. Alain On 09-Aug-10 12:43, Capasia wrote: This is my first post to the mailing list and I guess it's a pretty stupid question but I can't figure it out. I hope this is the right forum for these kind of questions. Before I started using R I was using STATA to run a Wilcoxon signed-rank test on two variables. See data below: https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html STATA Output: . signrank x=y Wilcoxon signed-rank test sign | obs sum ranksexpected -+- positive | 413101 2330.5 negative | 181560 2330.5 zero | 4912251225 -+- all | 10858865886 unadjusted variance 106438.50 adjustment for ties -282.38 adjustment for zeros -10106.25 -- adjusted variance 96049.88 Ho: transfer_2_a = transfer_2_b z = 2.486 Prob |z| = *0.0129* When running a Wilcoxon signed-rank test wilcox.test(datablatt$x, datablatt$y) Wilcoxon rank sum test with continuity correction data: datablatt$x and datablatt$y W = 7059.5, p-value = *0.09197* alternative hypothesis: true location shift is not equal to 0 As you can see the p Values are different (one with H0 rejection and the other one not). I tested whether it could be that the STATA one isn't paired but this doesn't seem to be the problem. I'm dumbfound what could lead to such a difference. I couldn't find any seetings I have missed but I somehow I guess I'm using the function in the wrong way... Any ideas? Thanks a lot in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference Between R: wilcox.test and STATA: signrank
On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote: Hi, Look at the output of the test made in R and you can see it is a Wilcoxon rank sum test and not a Wilcoxon signed rank test. It might be helpful to add that paired=TRUE is needed in the call to get the signed-rank test. If there are ties, I know I prefer wilcox.exact from the exactRankTests. (Not that much of an issue in larger sample sizes, I'd say. Even with binary data, the normal approximation works reasonably well under the usual assumptions of expected counts 5, since the tie-adjustment for the variance is exact for the distribution of the ranks. The continuity correction doesn't quite work though. Anyways, wilcox.exact is of course a nice thing to have.) Alain On 09-Aug-10 12:43, Capasia wrote: This is my first post to the mailing list and I guess it's a pretty stupid question but I can't figure it out. I hope this is the right forum for these kind of questions. Before I started using R I was using STATA to run a Wilcoxon signed-rank test on two variables. See data below: https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html STATA Output: . signrank x=y Wilcoxon signed-rank test sign | obs sum ranksexpected -+- positive | 413101 2330.5 negative | 181560 2330.5 zero | 4912251225 -+- all | 10858865886 unadjusted variance 106438.50 adjustment for ties -282.38 adjustment for zeros -10106.25 -- adjusted variance 96049.88 Ho: transfer_2_a = transfer_2_b z = 2.486 Prob |z| = *0.0129* When running a Wilcoxon signed-rank test wilcox.test(datablatt$x, datablatt$y) Wilcoxon rank sum test with continuity correction data: datablatt$x and datablatt$y W = 7059.5, p-value = *0.09197* alternative hypothesis: true location shift is not equal to 0 As you can see the p Values are different (one with H0 rejection and the other one not). I tested whether it could be that the STATA one isn't paired but this doesn't seem to be the problem. I'm dumbfound what could lead to such a difference. I couldn't find any seetings I have missed but I somehow I guess I'm using the function in the wrong way... Any ideas? Thanks a lot in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference Between R: wilcox.test and STATA: signrank
On Aug 9, 2010, at 9:52 AM, peter dalgaard wrote: On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote: Hi, Look at the output of the test made in R and you can see it is a Wilcoxon rank sum test and not a Wilcoxon signed rank test. It might be helpful to add that paired=TRUE is needed in the call to get the signed-rank test. If there are ties, I know I prefer wilcox.exact from the exactRankTests. (Not that much of an issue in larger sample sizes, I'd say. Even with binary data, the normal approximation works reasonably well under the usual assumptions of expected counts 5, since the tie- adjustment for the variance is exact for the distribution of the ranks. The continuity correction doesn't quite work though. Anyways, wilcox.exact is of course a nice thing to have.) The OP's data: table(xvals=dat$x, yvals=dat$y) yvals xvals 0 0.25 0.5 1 1.1 1.5 2 3 3.5 5 5.5 6 8 0350 0 1 0 1 2 1 0 0 0 0 0 0.5 21 1 0 0 0 0 0 0 0 0 0 0 0.75 00 1 0 0 0 0 0 0 0 0 0 0 1 70 1 3 0 0 1 0 1 0 0 0 0 1.1 00 0 0 1 0 0 0 0 0 0 0 0 1.5 11 0 4 0 2 0 0 0 0 0 0 0 2 30 0 6 0 2 4 2 1 0 0 0 0 2.1 00 1 0 0 0 0 0 0 0 0 0 0 2.5 00 0 0 0 1 0 0 0 2 0 0 0 3 20 0 0 0 0 5 3 1 1 0 0 0 3.3 10 0 0 0 0 0 0 0 0 0 0 0 3.33 00 0 1 0 0 0 0 0 0 0 0 0 3.5 00 0 1 0 1 0 0 0 1 1 0 0 5 00 0 0 0 0 0 0 0 2 0 1 1 1000 0 0 0 0 0 0 0 1 0 0 0 Adding paired=TRUE to the wilcox.test call give the signed rank test although tht is not likely to satisfy the OP since she seems to be expecting a higher degree of congruence with Stata. The wilcox.test and wilcox.exact give results that only differ at the 4th decimal place. wilcox.test(dat$x, dat$y, paired=TRUE) Wilcoxon signed rank test with continuity correction data: dat$x and dat$y V = 1181, p-value = 0.08872 alternative hypothesis: true location shift is not equal to 0 wilcox.exact(dat$x, dat$y, paired=TRUE) Asymptotic Wilcoxon signed rank test data: dat$x and dat$y V = 1181, p-value = 0.08805 alternative hypothesis: true mu is not equal to 0 The Stata output indicates some sort of adjustment for zeros. The wilcox.test basically throws out the zeros (presumably the zero differences), so there may be a difference in the algorithm. Her data has 51 zero differences and 61 non-zero differences. sum(dat$x==dat$y) [1] 51 sum(dat$x!=dat$y) [1] 61 Wait a minute; the Stata report said she had 49 zeros and only 108 records. Different data. Different results. I suppose it could be my editing errors. Taking out all the extraneous html junk and restoring missing delimiters was kind of a pain. Capasia; Don't use Google sheets to transmit data. Instead use dput on the datablatt object and just post the results of that output. -- David. Alain On 09-Aug-10 12:43, Capasia wrote: This is my first post to the mailing list and I guess it's a pretty stupid question but I can't figure it out. I hope this is the right forum for these kind of questions. Before I started using R I was using STATA to run a Wilcoxon signed-rank test on two variables. See data below: https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html %20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html STATA Output: . signrank x=y Wilcoxon signed-rank test sign | obs sum ranksexpected -+- positive | 413101 2330.5 negative | 181560 2330.5 zero | 4912251225 -+- all | 10858865886 unadjusted variance 106438.50 adjustment for ties -282.38 adjustment for zeros -10106.25 -- adjusted variance 96049.88 Ho: transfer_2_a = transfer_2_b z = 2.486 Prob |z| = *0.0129* When running a Wilcoxon signed-rank test wilcox.test(datablatt$x, datablatt$y) Wilcoxon rank sum test with continuity correction data: datablatt$x and datablatt$y W = 7059.5, p-value = *0.09197* alternative hypothesis: true location shift is not equal to 0 As you can see the p Values are different (one with H0 rejection and the other one not). I tested whether it could be that the STATA one isn't paired but this doesn't seem to be the problem. I'm dumbfound what could lead to such a difference. I couldn't find any seetings I have missed but I somehow I guess I'm using the function in the wrong way... Any ideas? Thanks a lot in advance! [[alternative HTML