Re: [R] Wilcoxon signed rank test and its requirements

Greg Snow Sat, 26 Jun 2010 13:31:40 -0700

No I mean something like this, assuming that the iris dataset contains the full 
population and we want to see if Setaso have a different mean than the 
population (the null would be that there is no difference in sepal width 
between species, or that species tells nothing about sepal width):



out1 <- replicate( 100000, mean(sample(iris$Sepal.Width, 50)) )
obs1 <- mean( iris$Sepal.Width[1:50] )

hist(out1, xlim=range(out1,obs1))
abline(v=obs1)

mean( out1 > obs1 )


I don't have a reference (other than a text book that defines sampling 
distributions).

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: Atte Tenkanen [mailto:atte...@utu.fi]
Sent: Friday, June 25, 2010 10:08 PM
To: Atte Tenkanen
Cc: Greg Snow; David Winsemius; R mailing list
Subject: Re: [R] Wilcoxon signed rank test and its requirements


Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:



Greg Snow kirjoitti 25.6.2010 kello 21.55:


Let me see if I understand.  You actually have the data for the whole 
population (the entire piece) but you have some pre-defined sections that you 
want to see if they differ from the population, or more meaningfully they are 
different from a randomly selected set of measures.  Is that correct?

If so, since you have the entire population of interest you can create the 
actual sampling distribution (or a good approximation of it).  Just take random 
samples from the population of the given size (matching the subset you are 
interested in) and calculate the means (or other value of interest), probably 
10,000 to 1,000,000 samples.  Now compare the value from your predefined subset 
to the set of random values you generated to see if it is in the tail or not.

I check, so you mean doing it this way:

t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean(SAMPLE), alt = 
"less")

NO, this way:

t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], 
mu=mean(SAMPLE), alt = "less")

Atte



Atte



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org<mailto:greg.s...@imail.org>
801.408.8111


-----Original Message-----
From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
[mailto:r-help-boun...@r-
project.org] On Behalf Of Atte Tenkanen
Sent: Thursday, June 24, 2010 11:04 PM
To: David Winsemius
Cc: R mailing list
Subject: Re: [R] Wilcoxon signed rank test and its requirements

The values come from this kind of process:
The musical composition is segmented into so-called 'pitch-class
segments' and these segments are compared with one reference set with a
distance function. Only some distance values are possible. These
distance values can be averaged over music bars which produces smoother
distribution and the 'comparison curve' that illustrates the distances
according to the reference set through a musical piece result in more
readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I
would prefer to use original values.

then, I want to pick only some regions from the piece and compare those
values of those regions, whether they are higher than the mean of all
values.

Atte

On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

Is there anything for me?

There is a lot of data, n=2418, but there are also a lot of ties.
My sample n250-300


I do not understand why there should be so many ties. You have not
described the measurement process or units. ( ... although you offer
a

glipmse without much background  later.)

i would like to test, whether the mean of the sample differ
significantly from the population mean.

Why? What is the purpose of this investigation? Why should the mean
of

a sample be that important?


The histogram of the population looks like in attached histogram,
what test should I use? No choices?

This distribution comes from a musical piece and the values are
'tonal distances'.

http://users.utu.fi/attenka/Hist.png

That picture does not offer much insidght into the features of that
measurement. It appears to have much more structure than I would
expect for a sample from a smooth unimodal underlying population.

--
David.


Atte

On 06/24/2010 12:40 PM, David Winsemius wrote:

On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

"The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume
that

the
data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much

about
whether the median is different than the hypothetical value."

You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does
not
mean
that it is necessarily true. My understanding (confirmed
reviewing
"Nonparametric statistical methods for complete and censored
data"
by M.
M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
test
does
not require that the underlying distributions be symmetric. The
above
quotation is highly inaccurate.


To add to what David and others have said, look at the kernel that

the

U-statistic associated with the WSR test uses: the indicator (0/1)
of
xi
+ xj > 0.  So WSR tests H0:p=0.5 where p = the probability that
the
average of a randomly chosen pair of values is positive.  [If
there
are
ties this probably needs to be worded as P[xi + xj > 0] = P[xi +
xj
<

0], i neq j.

Frank

--
Frank E Harrell Jr   Professor and Chairman        School of
Medicine
                     Department of Biostatistics   Vanderbilt
University


______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

Reply via email to