Re: [R] Difference Between R: wilcox.test and STATA: signrank

2010-08-09 Thread Alain Guillet

 Hi,

Look at the output of the test made in R and you can see it is a 
Wilcoxon rank sum test and not a Wilcoxon signed rank test.


If there are ties, I know I prefer wilcox.exact from the exactRankTests.

Alain

On 09-Aug-10 12:43, Capasia wrote:

This is my first post to the mailing list and I guess it's a pretty stupid
question but I can't figure it out. I hope this is the right forum for these
kind of questions.

Before I started using R I was using STATA to run a Wilcoxon signed-rank
test on two variables. See data below:

https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html

STATA Output:
. signrank x=y

Wilcoxon signed-rank test

sign |  obs   sum ranksexpected
-+-
positive |   413101  2330.5
negative |   181560  2330.5
zero |   4912251225
-+-
 all |  10858865886

unadjusted variance   106438.50
adjustment for ties -282.38
adjustment for zeros  -10106.25
 --
adjusted variance  96049.88

Ho: transfer_2_a = transfer_2_b
 z =   2.486
Prob  |z| =   *0.0129*

When running a Wilcoxon signed-rank test



wilcox.test(datablatt$x, datablatt$y)

Wilcoxon rank sum test with continuity correction

data:  datablatt$x and datablatt$y
W = 7059.5, p-value = *0.09197*
alternative hypothesis: true location shift is not equal to 0

As you can see the p Values are different (one with H0 rejection and the
other one not). I tested whether it could be that the STATA one isn't paired
but this doesn't seem to be the problem.

I'm dumbfound what could lead to such a difference. I couldn't find any
seetings I have missed but I somehow I guess I'm using the function in the
wrong way...
Any ideas?
Thanks a lot in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
Bureau c.316
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference Between R: wilcox.test and STATA: signrank

2010-08-09 Thread peter dalgaard

On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote:

 Hi,
 
 Look at the output of the test made in R and you can see it is a Wilcoxon 
 rank sum test and not a Wilcoxon signed rank test.

It might be helpful to add that paired=TRUE is needed in the call to get the 
signed-rank test.

 If there are ties, I know I prefer wilcox.exact from the exactRankTests.
 

(Not that much of an issue in larger sample sizes, I'd say. Even with binary 
data, the normal approximation works reasonably well under the usual 
assumptions of expected counts  5, since the tie-adjustment for the variance 
is exact for the distribution of the ranks. The continuity correction doesn't 
quite work though. Anyways, wilcox.exact is of course a nice thing to have.)


 Alain
 
 On 09-Aug-10 12:43, Capasia wrote:
 This is my first post to the mailing list and I guess it's a pretty stupid
 question but I can't figure it out. I hope this is the right forum for these
 kind of questions.
 
 Before I started using R I was using STATA to run a Wilcoxon signed-rank
 test on two variables. See data below:
 
 https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html
 
 STATA Output:
 . signrank x=y
 
 Wilcoxon signed-rank test
 
sign |  obs   sum ranksexpected
 -+-
positive |   413101  2330.5
negative |   181560  2330.5
zero |   4912251225
 -+-
 all |  10858865886
 
 unadjusted variance   106438.50
 adjustment for ties -282.38
 adjustment for zeros  -10106.25
 --
 adjusted variance  96049.88
 
 Ho: transfer_2_a = transfer_2_b
 z =   2.486
Prob  |z| =   *0.0129*
 
 When running a Wilcoxon signed-rank test
 
 
 wilcox.test(datablatt$x, datablatt$y)
 Wilcoxon rank sum test with continuity correction
 
 data:  datablatt$x and datablatt$y
 W = 7059.5, p-value = *0.09197*
 alternative hypothesis: true location shift is not equal to 0
 
 As you can see the p Values are different (one with H0 rejection and the
 other one not). I tested whether it could be that the STATA one isn't paired
 but this doesn't seem to be the problem.
 
 I'm dumbfound what could lead to such a difference. I couldn't find any
 seetings I have missed but I somehow I guess I'm using the function in the
 wrong way...
 Any ideas?
 Thanks a lot in advance!
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 -- 
 Alain Guillet
 Statistician and Computer Scientist
 
 SMCS - IMMAQ - Université catholique de Louvain
 Bureau c.316
 Voie du Roman Pays, 20
 B-1348 Louvain-la-Neuve
 Belgium
 
 tel: +32 10 47 30 50
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference Between R: wilcox.test and STATA: signrank

2010-08-09 Thread David Winsemius


On Aug 9, 2010, at 9:52 AM, peter dalgaard wrote:



On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote:


Hi,

Look at the output of the test made in R and you can see it is a  
Wilcoxon rank sum test and not a Wilcoxon signed rank test.


It might be helpful to add that paired=TRUE is needed in the call to  
get the signed-rank test.


If there are ties, I know I prefer wilcox.exact from the  
exactRankTests.




(Not that much of an issue in larger sample sizes, I'd say. Even  
with binary data, the normal approximation works reasonably well  
under the usual assumptions of expected counts  5, since the tie- 
adjustment for the variance is exact for the distribution of the  
ranks. The continuity correction doesn't quite work though. Anyways,  
wilcox.exact is of course a nice thing to have.)


The OP's data:

 table(xvals=dat$x, yvals=dat$y)
  yvals
xvals   0 0.25 0.5  1 1.1 1.5  2  3 3.5  5 5.5  6  8
  0350   0  1   0   1  2  1   0  0   0  0  0
  0.5   21   1  0   0   0  0  0   0  0   0  0  0
  0.75  00   1  0   0   0  0  0   0  0   0  0  0
  1 70   1  3   0   0  1  0   1  0   0  0  0
  1.1   00   0  0   1   0  0  0   0  0   0  0  0
  1.5   11   0  4   0   2  0  0   0  0   0  0  0
  2 30   0  6   0   2  4  2   1  0   0  0  0
  2.1   00   1  0   0   0  0  0   0  0   0  0  0
  2.5   00   0  0   0   1  0  0   0  2   0  0  0
  3 20   0  0   0   0  5  3   1  1   0  0  0
  3.3   10   0  0   0   0  0  0   0  0   0  0  0
  3.33  00   0  1   0   0  0  0   0  0   0  0  0
  3.5   00   0  1   0   1  0  0   0  1   1  0  0
  5 00   0  0   0   0  0  0   0  2   0  1  1
  1000   0  0   0   0  0  0   0  1   0  0  0


Adding paired=TRUE to the wilcox.test call give the signed rank test  
although tht is not likely to satisfy the OP since she seems to be  
expecting a higher degree of congruence with Stata.


The wilcox.test and wilcox.exact give results that only differ at the  
4th decimal place.


 wilcox.test(dat$x, dat$y, paired=TRUE)

Wilcoxon signed rank test with continuity correction

data:  dat$x and dat$y
V = 1181, p-value = 0.08872
alternative hypothesis: true location shift is not equal to 0

 wilcox.exact(dat$x, dat$y, paired=TRUE)

Asymptotic Wilcoxon signed rank test

data:  dat$x and dat$y
V = 1181, p-value = 0.08805
alternative hypothesis: true mu is not equal to 0

The Stata output indicates some sort of adjustment for zeros. The  
wilcox.test basically throws out the zeros (presumably the zero  
differences), so there may be a difference in the algorithm. Her data  
has 51 zero differences and 61 non-zero differences.


 sum(dat$x==dat$y)
[1] 51
 sum(dat$x!=dat$y)
[1] 61

Wait a minute; the Stata report said she had 49 zeros and only 108  
records.


Different data. Different results. I suppose it could be my editing  
errors. Taking out all the extraneous html junk and restoring missing  
delimiters was kind of a pain.


Capasia;  Don't use Google sheets to transmit data. Instead use dput  
on the datablatt object and just post the results of that output.


--
David.




Alain

On 09-Aug-10 12:43, Capasia wrote:
This is my first post to the mailing list and I guess it's a  
pretty stupid
question but I can't figure it out. I hope this is the right forum  
for these

kind of questions.

Before I started using R I was using STATA to run a Wilcoxon  
signed-rank

test on two variables. See data below:

https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html 
%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkEhl=enoutput=html 



STATA Output:
. signrank x=y

Wilcoxon signed-rank test

  sign |  obs   sum ranksexpected
-+-
  positive |   413101  2330.5
  negative |   181560  2330.5
  zero |   4912251225
-+-
   all |  10858865886

unadjusted variance   106438.50
adjustment for ties -282.38
adjustment for zeros  -10106.25
   --
adjusted variance  96049.88

Ho: transfer_2_a = transfer_2_b
   z =   2.486
  Prob  |z| =   *0.0129*

When running a Wilcoxon signed-rank test



wilcox.test(datablatt$x, datablatt$y)

Wilcoxon rank sum test with continuity correction

data:  datablatt$x and datablatt$y
W = 7059.5, p-value = *0.09197*
alternative hypothesis: true location shift is not equal to 0

As you can see the p Values are different (one with H0 rejection  
and the
other one not). I tested whether it could be that the STATA one  
isn't paired

but this doesn't seem to be the problem.

I'm dumbfound what could lead to such a difference. I couldn't  
find any
seetings I have missed but I somehow I guess I'm using the  
function in the

wrong way...
Any ideas?
Thanks a lot in advance!

[[alternative HTML