On Aug 9, 2010, at 9:52 AM, peter dalgaard wrote:
On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote:
Hi,
Look at the output of the test made in R and you can see it is a
Wilcoxon rank sum test and not a Wilcoxon signed rank test.
It might be helpful to add that paired=TRUE is needed in the call to
get the signed-rank test.
If there are ties, I know I prefer wilcox.exact from the
exactRankTests.
(Not that much of an issue in larger sample sizes, I'd say. Even
with binary data, the normal approximation works reasonably well
under the usual assumptions of expected counts > 5, since the tie-
adjustment for the variance is exact for the distribution of the
ranks. The continuity correction doesn't quite work though. Anyways,
wilcox.exact is of course a nice thing to have.)
The OP's data:
> table(xvals=dat$x, yvals=dat$y)
yvals
xvals 0 0.25 0.5 1 1.1 1.5 2 3 3.5 5 5.5 6 8
0 35 0 0 1 0 1 2 1 0 0 0 0 0
0.5 2 1 1 0 0 0 0 0 0 0 0 0 0
0.75 0 0 1 0 0 0 0 0 0 0 0 0 0
1 7 0 1 3 0 0 1 0 1 0 0 0 0
1.1 0 0 0 0 1 0 0 0 0 0 0 0 0
1.5 1 1 0 4 0 2 0 0 0 0 0 0 0
2 3 0 0 6 0 2 4 2 1 0 0 0 0
2.1 0 0 1 0 0 0 0 0 0 0 0 0 0
2.5 0 0 0 0 0 1 0 0 0 2 0 0 0
3 2 0 0 0 0 0 5 3 1 1 0 0 0
3.3 1 0 0 0 0 0 0 0 0 0 0 0 0
3.33 0 0 0 1 0 0 0 0 0 0 0 0 0
3.5 0 0 0 1 0 1 0 0 0 1 1 0 0
5 0 0 0 0 0 0 0 0 0 2 0 1 1
10 0 0 0 0 0 0 0 0 0 1 0 0 0
Adding paired=TRUE to the wilcox.test call give the signed rank test
although tht is not likely to satisfy the OP since she seems to be
expecting a higher degree of congruence with Stata.
The wilcox.test and wilcox.exact give results that only differ at the
4th decimal place.
> wilcox.test(dat$x, dat$y, paired=TRUE)
Wilcoxon signed rank test with continuity correction
data: dat$x and dat$y
V = 1181, p-value = 0.08872
alternative hypothesis: true location shift is not equal to 0
> wilcox.exact(dat$x, dat$y, paired=TRUE)
Asymptotic Wilcoxon signed rank test
data: dat$x and dat$y
V = 1181, p-value = 0.08805
alternative hypothesis: true mu is not equal to 0
The Stata output indicates some sort of adjustment for zeros. The
wilcox.test basically throws out the zeros (presumably the zero
differences), so there may be a difference in the algorithm. Her data
has 51 zero differences and 61 non-zero differences.
> sum(dat$x==dat$y)
[1] 51
> sum(dat$x!=dat$y)
[1] 61
Wait a minute; the Stata report said she had 49 zeros and only 108
records.
Different data. Different results. I suppose it could be my editing
errors. Taking out all the extraneous html junk and restoring missing
delimiters was kind of a pain.
Capasia; Don't use Google sheets to transmit data. Instead use dput
on the datablatt object and just post the results of that output.
--
David.
Alain
On 09-Aug-10 12:43, Capasia wrote:
This is my first post to the mailing list and I guess it's a
pretty stupid
question but I can't figure it out. I hope this is the right forum
for these
kind of questions.
Before I started using R I was using STATA to run a Wilcoxon
signed-rank
test on two variables. See data below:
https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkE&hl=en&output=html
<%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkE&hl=en&output=html
>
STATA Output:
. signrank x=y
Wilcoxon signed-rank test
sign | obs sum ranks expected
-------------+---------------------------------
positive | 41 3101 2330.5
negative | 18 1560 2330.5
zero | 49 1225 1225
-------------+---------------------------------
all | 108 5886 5886
unadjusted variance 106438.50
adjustment for ties -282.38
adjustment for zeros -10106.25
----------
adjusted variance 96049.88
Ho: transfer_2_a = transfer_2_b
z = 2.486
Prob> |z| = *0.0129*
When running a Wilcoxon signed-rank test
wilcox.test(datablatt$x, datablatt$y)
Wilcoxon rank sum test with continuity correction
data: datablatt$x and datablatt$y
W = 7059.5, p-value = *0.09197*
alternative hypothesis: true location shift is not equal to 0
As you can see the p Values are different (one with H0 rejection
and the
other one not). I tested whether it could be that the STATA one
isn't paired
but this doesn't seem to be the problem.
I'm dumbfound what could lead to such a difference. I couldn't
find any
seetings I have missed but I somehow I guess I'm using the
function in the
wrong way...
Any ideas?
Thanks a lot in advance!
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Alain Guillet
Statistician and Computer Scientist
SMCS - IMMAQ - Université catholique de Louvain
Bureau c.316
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium
tel: +32 10 47 30 50
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: pda...@gmail.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.