Re: [R] shapiro.test

2014-02-25 Thread Keith S Weintraub
Regarding : ... I don't know
what the 4th to last page would be called (could add another ante-, or
in R just use tail(book,4))...

According to wordsmith.org (sign up it's free, note I have no affiliation to 
that site) the word is preantepenultimate.


Check it out:
http://wordsmith.org/words/preantepenultimate.html

Yes it's off-topic. Please write your congressman.

KW

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-24 Thread Greg Snow
Philippe,

replies inline

On Sat, Feb 22, 2014 at 12:29 AM, Philippe Grosjean
phgrosj...@sciviews.org wrote:
 Greg,

 I really like that TeachingDemos::SnowsPenultimateNormalityTest()...

If you like that function then you may appreciate
TeachingDemos::SnowsCorrectlySizedButOtherwiseUselessTestOfAnything,
which I suspect (but have been to lazy to check) may be the longest
exported function name in a CRAN package.  I justify the names of
these 2 functions using the same logic that suggests short and simple
names for functions that you would expect to be used often.

 even the tortuous way to always return a p-value == 0:


It turns out (discovered by accident and then brought to my attention)
that if you run SnowsPenultimateNormalityTest on a vector of length 0
then it does return a p-value of 1.  I have not yet decided if this is
a bug or a feature.  On one hand it makes sense that a sample of size
0 is perfectly consistent with the assumption that you chose 0
observations from a normal distribution, on the other hand, if it is
an integer or double vector of length 0 that would still be
information that the numbers (or lack thereof) are rational.

[snip]

 I am just curious... Are there teachers out there pointing to that test? If 
 yes, what fraction of the students realise what happens? I guess, it is 
 closer to zero than to one, unfortunately. Wait... I need another 
 SnowsPenultimateXxxxTest() here to check the null hypothesis that all my 
 students are doing what they are supposed to do when discovering a new 
 statistical tool!

I don't know of any teachers pointing to the test, I would want to be
careful which class to bring it up in.  For some students it could
result in an epiphany, others may just blindly use it, and still
others may have their heads explode if they have to think to hard
about it.

I was originally considering naming the test SnowsAntepenultimeateTest
to give a little more room for follow-up tests, but at the time I
could not remember if it was Ante (before) or Anti (opposite).  I
learned the word Antepenultimate in terms of pages in a book, where
the 3rd to last page (the Antepenultimate page) is directly opposite
(Anti-) the Penultimate page.

Just in case that is not confusing enough, the ultimate page of a
cheap detective novel is the last page where the hero realizes that
since the motive for the murder was to cover up the murderer's
embezzlement of the family fortune to pay off his bookie, the hero
will not be paid after all and will still need to continue avoiding
his loan shark.  The penultimate page is the second to last page where
in response to the hero's listing of circumstantial evidence the
murderer conveniently confesses and fills in all the missing details
saving the embarrassment to the hero if he had just lawyer-ed up and
been acquitted due to lack of hard evidence.  And the antepenultimate
page is the 3rd to last where the hero utters the cliche phrase You
are probably wondering why I gathered you all here.  I don't know
what the 4th to last page would be called (could add another ante-, or
in R just use tail(book,4)).



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-24 Thread Clint Bowman

Greg,

For some authors the 4th page from the back should be the first page.

Not so for you, however.

Clint

Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

USPS:   PO Box 47600, Olympia, WA 98504-7600
Parcels:300 Desmond Drive, Lacey, WA 98503-1274

On Mon, 24 Feb 2014, Greg Snow wrote:


Philippe,

replies inline

On Sat, Feb 22, 2014 at 12:29 AM, Philippe Grosjean
phgrosj...@sciviews.org wrote:

Greg,

I really like that TeachingDemos::SnowsPenultimateNormalityTest()...


If you like that function then you may appreciate
TeachingDemos::SnowsCorrectlySizedButOtherwiseUselessTestOfAnything,
which I suspect (but have been to lazy to check) may be the longest
exported function name in a CRAN package.  I justify the names of
these 2 functions using the same logic that suggests short and simple
names for functions that you would expect to be used often.


even the tortuous way to always return a p-value == 0:



It turns out (discovered by accident and then brought to my attention)
that if you run SnowsPenultimateNormalityTest on a vector of length 0
then it does return a p-value of 1.  I have not yet decided if this is
a bug or a feature.  On one hand it makes sense that a sample of size
0 is perfectly consistent with the assumption that you chose 0
observations from a normal distribution, on the other hand, if it is
an integer or double vector of length 0 that would still be
information that the numbers (or lack thereof) are rational.

[snip]


I am just curious... Are there teachers out there pointing to that test? If 
yes, what fraction of the students realise what happens? I guess, it is closer 
to zero than to one, unfortunately. Wait... I need another 
SnowsPenultimateXxxxTest() here to check the null hypothesis that all my 
students are doing what they are supposed to do when discovering a new 
statistical tool!


I don't know of any teachers pointing to the test, I would want to be
careful which class to bring it up in.  For some students it could
result in an epiphany, others may just blindly use it, and still
others may have their heads explode if they have to think to hard
about it.

I was originally considering naming the test SnowsAntepenultimeateTest
to give a little more room for follow-up tests, but at the time I
could not remember if it was Ante (before) or Anti (opposite).  I
learned the word Antepenultimate in terms of pages in a book, where
the 3rd to last page (the Antepenultimate page) is directly opposite
(Anti-) the Penultimate page.

Just in case that is not confusing enough, the ultimate page of a
cheap detective novel is the last page where the hero realizes that
since the motive for the murder was to cover up the murderer's
embezzlement of the family fortune to pay off his bookie, the hero
will not be paid after all and will still need to continue avoiding
his loan shark.  The penultimate page is the second to last page where
in response to the hero's listing of circumstantial evidence the
murderer conveniently confesses and fills in all the missing details
saving the embarrassment to the hero if he had just lawyer-ed up and
been acquitted due to lack of hard evidence.  And the antepenultimate
page is the 3rd to last where the hero utters the cliche phrase You
are probably wondering why I gathered you all here.  I don't know
what the 4th to last page would be called (could add another ante-, or
in R just use tail(book,4)).



--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-22 Thread Rui Barradas

Hello,

Inline

Em 21-02-2014 23:13, Rolf Turner escreveu:

On 22/02/14 11:04, Rui Barradas wrote:

Hello,

Not answering directly to your question, if the sample size is a
documented problem with shapiro.test and you want a normality test, why
don't you use ?ks.test?

m - mean(HP_TrinityK25$V2)
s - sd(HP_TrinityK25$V2)

ks.test(HP_TrinityK25$V2, pnorm, m, s)


Strictly speaking this is not a valid test.  The KS test is used for
testing against a *completely specified* distribution.  If there are
parameters to be estimated, the null distribution is no longer
applicable.  This may not be a real problem if the parameters are
*well* estimated, as they would be in this instance (given that the
sample size is over-large).  I'm not sure about this.


Yes, you're right. I hesitated before posting my answer precisely 
because of this, the parameters must be pre-determined constants, not 
computed from the data. Like Greg pointed out in his reply, the help 
page for ?ks.test also explicitly refers to it (which I had missed).


The chi-squared gof test seems to be a good choice, given the sample size.

Rui Barradas


The Lilliefors test is theoretically available in this context when
mu and sigma are estimated, but according to the Wikipedia article, the
Lilliefors distribution is not known analytically and the critical
values must be determined by Monte Carlo methods.  There is a
LillieTest function in the DescTools package which makes use of some
approximations to get p-values.

However I think that a better approach would be to use a chi-squared
goodness of fit test whereby you can adjust for estimated parameters
simply by reducing the degrees of freedom.  I believe that the
chi-squared test is somewhat low in power, but with a very large sample
this should not be a problem.

The difficulty with the chi-squared test is that the choice of bins is
somewhat arbitrary.  I believe the best approach is to take the bin
boundaries to be the quantiles of the normal distribution (with
parameters m and s) corresponding to equispaced probabilities on
[0,1], with the number of such probabilities being k+1 where
k = floor(n/5), n being the sample size.  This makes the expected counts
all equal to n/k = 5 so that the chi-squared test is valid.  The
degrees of freedom are then k-3 (k - 1 - #estimated parameters).

One last comment:  I believe that it is generally considered that
testing for normality is a waste of time and a pseudo-intellectual
exercise of academic interest at best.

cheers,

Rolf Turner




Hope this helps,

Rui Barradas

Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu:

Dear R users,
Please help with with this maybe basic question. I am trying to see
if my
data is normal but is a large file and the test does not work.
I keep getting the message : Error in shapiro.test(x =
HP_TrinityK25$V2)
:  sample size must be between 3 and 5000
thanks!

  shapiro.test(x=HP_TrinityK25$V2)
Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be
between 3
and 5000

##Note:
HP_TrinityK25= my file
HP_TrinityK25$V2= data in my file

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-22 Thread Rui Barradas

Second.

Rui Barradas

Em 21-02-2014 23:44, Rolf Turner escreveu:

On 22/02/14 11:53, Greg Snow wrote:

SNIP



Why are you testing your data for normality?  For large sample sizes
the normality tests often give a meaningful answer to a meaningless
question (for small samples they give a meaningless answer to a
meaningful question).


SNIP

Fortune!!!

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shapiro.test

2014-02-21 Thread Gonzalo Villarino Pizarro
Dear R users,
Please help with with this maybe basic question. I am trying to see if my
data is normal but is a large file and the test does not work.
I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2)
:  sample size must be between 3 and 5000
thanks!

 shapiro.test(x=HP_TrinityK25$V2)
Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3
and 5000

##Note:
HP_TrinityK25= my file
HP_TrinityK25$V2= data in my file

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-21 Thread Rui Barradas

Hello,

Not answering directly to your question, if the sample size is a 
documented problem with shapiro.test and you want a normality test, why 
don't you use ?ks.test?


m - mean(HP_TrinityK25$V2)
s - sd(HP_TrinityK25$V2)

ks.test(HP_TrinityK25$V2, pnorm, m, s)


Hope this helps,

Rui Barradas

Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu:

Dear R users,
Please help with with this maybe basic question. I am trying to see if my
data is normal but is a large file and the test does not work.
I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2)
:  sample size must be between 3 and 5000
thanks!

  shapiro.test(x=HP_TrinityK25$V2)
Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3
and 5000

##Note:
HP_TrinityK25= my file
HP_TrinityK25$V2= data in my file

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-21 Thread Greg Snow
Rui,

Note this quote from the last paragraph of the Details section of ?ks.test:

If a single-sample test is used, the parameters specified in '...'
 must be pre-specified and not estimated from the data.

Which is the exact opposite of your example.



Gonzalo,

Why are you testing your data for normality?  For large sample sizes
the normality tests often give a meaningful answer to a meaningless
question (for small samples they give a meaningless answer to a
meaningful question).

If you really feel the need for a p-value then
SnowsPenultimateNormalityTest in the TeachingDemos package will work
for large sample sizes.  But note that the documentation for that
function is considered more useful than the function itself.



On Fri, Feb 21, 2014 at 3:04 PM, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 Not answering directly to your question, if the sample size is a documented
 problem with shapiro.test and you want a normality test, why don't you use
 ?ks.test?

 m - mean(HP_TrinityK25$V2)
 s - sd(HP_TrinityK25$V2)

 ks.test(HP_TrinityK25$V2, pnorm, m, s)


 Hope this helps,

 Rui Barradas

 Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu:

 Dear R users,
 Please help with with this maybe basic question. I am trying to see if my
 data is normal but is a large file and the test does not work.
 I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2)
 :  sample size must be between 3 and 5000
 thanks!

   shapiro.test(x=HP_TrinityK25$V2)
 Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between
 3
 and 5000

 ##Note:
 HP_TrinityK25= my file
 HP_TrinityK25$V2= data in my file

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-21 Thread Rolf Turner

On 22/02/14 11:04, Rui Barradas wrote:

Hello,

Not answering directly to your question, if the sample size is a
documented problem with shapiro.test and you want a normality test, why
don't you use ?ks.test?

m - mean(HP_TrinityK25$V2)
s - sd(HP_TrinityK25$V2)

ks.test(HP_TrinityK25$V2, pnorm, m, s)


Strictly speaking this is not a valid test.  The KS test is used for 
testing against a *completely specified* distribution.  If there are 
parameters to be estimated, the null distribution is no longer 
applicable.  This may not be a real problem if the parameters are 
*well* estimated, as they would be in this instance (given that the 
sample size is over-large).  I'm not sure about this.


The Lilliefors test is theoretically available in this context when
mu and sigma are estimated, but according to the Wikipedia article, the 
Lilliefors distribution is not known analytically and the critical 
values must be determined by Monte Carlo methods.  There is a 
LillieTest function in the DescTools package which makes use of some 
approximations to get p-values.


However I think that a better approach would be to use a chi-squared 
goodness of fit test whereby you can adjust for estimated parameters 
simply by reducing the degrees of freedom.  I believe that the 
chi-squared test is somewhat low in power, but with a very large sample 
this should not be a problem.


The difficulty with the chi-squared test is that the choice of bins is 
somewhat arbitrary.  I believe the best approach is to take the bin 
boundaries to be the quantiles of the normal distribution (with 
parameters m and s) corresponding to equispaced probabilities on 
[0,1], with the number of such probabilities being k+1 where
k = floor(n/5), n being the sample size.  This makes the expected counts 
all equal to n/k = 5 so that the chi-squared test is valid.  The 
degrees of freedom are then k-3 (k - 1 - #estimated parameters).


One last comment:  I believe that it is generally considered that 
testing for normality is a waste of time and a pseudo-intellectual 
exercise of academic interest at best.


cheers,

Rolf Turner




Hope this helps,

Rui Barradas

Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu:

Dear R users,
Please help with with this maybe basic question. I am trying to see if my
data is normal but is a large file and the test does not work.
I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2)
:  sample size must be between 3 and 5000
thanks!

  shapiro.test(x=HP_TrinityK25$V2)
Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be
between 3
and 5000

##Note:
HP_TrinityK25= my file
HP_TrinityK25$V2= data in my file

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-21 Thread Rolf Turner

On 22/02/14 11:53, Greg Snow wrote:

SNIP



Why are you testing your data for normality?  For large sample sizes
the normality tests often give a meaningful answer to a meaningless
question (for small samples they give a meaningless answer to a
meaningful question).


SNIP

Fortune!!!

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-21 Thread Bert Gunter
Second!!

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Fri, Feb 21, 2014 at 3:44 PM, Rolf Turner r.tur...@auckland.ac.nz wrote:
 On 22/02/14 11:53, Greg Snow wrote:

 SNIP


 Why are you testing your data for normality?  For large sample sizes
 the normality tests often give a meaningful answer to a meaningless
 question (for small samples they give a meaningless answer to a
 meaningful question).


 SNIP

 Fortune!!!

 cheers,

 Rolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2014-02-21 Thread Philippe Grosjean
Greg,

I really like that TeachingDemos::SnowsPenultimateNormalityTest()… even the 
tortuous way to always return a p-value == 0:

# the following function works for current implementations of R
# to my knowledge, eventually it may need to be expanded
is.rational - function(x){
rep( TRUE, length(x) )
}

tmp.p - if( any(is.rational(x))) {
 0
} else {
 # current implementation will not get here if length
 # of x is positive.  This part is reserved for the
 # ultimate test
 1
}

(p.value is then returned as tmp.p). Also, the nice and sexy printing of that 
p-value in R as:

p-value  2.2e-16

which looks much more serious than 'p-value = 0'… Here you has nothing to do. 
The stats::format.pval() function called from stats:::print.htest() already 
does the job for you!

I am just curious… Are there teachers out there pointing to that test? If yes, 
what fraction of the students realise what happens? I guess, it is closer to 
zero than to one, unfortunately. Wait… I need another 
SnowsPenultimateXxxxTest() here to check the null hypothesis that all my 
students are doing what they are supposed to do when discovering a new 
statistical tool!

Best,

Philippe Grosjean



On 21 Feb 2014, at 23:53, Greg Snow 538...@gmail.com wrote:

 Rui,
 
 Note this quote from the last paragraph of the Details section of ?ks.test:
 
 If a single-sample test is used, the parameters specified in '...'
 must be pre-specified and not estimated from the data.
 
 Which is the exact opposite of your example.
 
 
 
 Gonzalo,
 
 Why are you testing your data for normality?  For large sample sizes
 the normality tests often give a meaningful answer to a meaningless
 question (for small samples they give a meaningless answer to a
 meaningful question).
 
 If you really feel the need for a p-value then
 SnowsPenultimateNormalityTest in the TeachingDemos package will work
 for large sample sizes.  But note that the documentation for that
 function is considered more useful than the function itself.
 
 
 
 On Fri, Feb 21, 2014 at 3:04 PM, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,
 
 Not answering directly to your question, if the sample size is a documented
 problem with shapiro.test and you want a normality test, why don't you use
 ?ks.test?
 
 m - mean(HP_TrinityK25$V2)
 s - sd(HP_TrinityK25$V2)
 
 ks.test(HP_TrinityK25$V2, pnorm, m, s)
 
 
 Hope this helps,
 
 Rui Barradas
 
 Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu:
 
 Dear R users,
 Please help with with this maybe basic question. I am trying to see if my
 data is normal but is a large file and the test does not work.
 I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2)
 :  sample size must be between 3 and 5000
 thanks!
 
  shapiro.test(x=HP_TrinityK25$V2)
 Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between
 3
 and 5000
 
 ##Note:
 HP_TrinityK25= my file
 HP_TrinityK25$V2= data in my file
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Gregory (Greg) L. Snow Ph.D.
 538...@gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shapiro.test()

2012-06-26 Thread reso

Hey,
today I wanted to use the shapiro.test() on data containing 3  
numerical values per group.

It is the first time that an NA was given back for some of the groups.
In the follwing an example of code and output is shown:



shapiro.test(c(0.000637806, 0.00175561, 0.001196708))


Shapiro-Wilk normality test

data:  c(0.000637806, 0.00175561, 0.001196708)
W = 1, p-value = NA

I am not able to find the bug in our data, so I think there might be a  
problem with the shapiro.test().


I use the following technical background:

platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  14.1
year   2011
month  12
day22
svn rev57956
language   R
version.string R version 2.14.1 (2011-12-22)


Thanks,
Judith

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test()

2012-06-26 Thread Özgür Asar
See

?shapiro.test

...the number of non-missing values must be between 3 and 5000.

By the way, how reasonable testing normality of 3 values?

Best
ozgur

--
View this message in context: 
http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634520.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test()

2012-06-26 Thread Özgür Asar
Actually, your sample size is 3. Sorry for that.

Ozgur

--
View this message in context: 
http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634525.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test()

2012-06-26 Thread peter dalgaard

On Jun 26, 2012, at 16:43 , r...@uni-potsdam.de wrote:

 Hey,
 today I wanted to use the shapiro.test() on data containing 3 numerical 
 values per group.
 It is the first time that an NA was given back for some of the groups.
 In the follwing an example of code and output is shown:
 
 
 shapiro.test(c(0.000637806, 0.00175561, 0.001196708))
 
   Shapiro-Wilk normality test
 
 data:  c(0.000637806, 0.00175561, 0.001196708)
 W = 1, p-value = NA
 
 I am not able to find the bug in our data, so I think there might be a 
 problem with the shapiro.test().

The clue is that

 diff(sort(c(0.000637806, 0.00175561, 0.001196708)))
[1] 0.000558902 0.000558902

which is either an extreme coincidence or a sign that your data are not 
independent samples from a continuous distribution. Since the normal quantiles 
are also equidistant, you get a correlation of W=1 in the QQ-plot, and 
apparently this triggers the NA p-value. 

I suppose returning p=1.0 would arguably be a better choice for this case, but 
it _is_ pretty extreme. 

-pd

 
 I use the following technical background:
 
 platform   x86_64-pc-linux-gnu
 arch   x86_64
 os linux-gnu
 system x86_64, linux-gnu
 status
 major  2
 minor  14.1
 year   2011
 month  12
 day22
 svn rev57956
 language   R
 version.string R version 2.14.1 (2011-12-22)
 
 
 Thanks,
 Judith
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2010-05-27 Thread Stephan Kolassa

Hi,

David Winsemius schrieb:
snip

This would imply that ozon is a list or dataframe.


snip


And you tried to give the whole list to a function that only wants a 
vector.


And whenever you suspect that your data types clash, try str() to find 
out just what kind of thing your data is. Here: str(ozon)


HTH,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2010-05-27 Thread Greg Snow
Others pointed out that the error message is due to ozon being a data frame, 
but I think the true source of confusion comes a bit earlier.  You really need 
to understand more about data objects and the search path.

You first read in a table and name it tab1.

Then you attach tab1 to the search path (there are better ways than attach now, 
while it can be a useful tool, it can also easily lead to problems like you are 
seeing).

The warning from attach tells you that there are now 2 things in the search 
path with the name ozon, one of which is an object in the global environment, 
the other is one of the columns of tab1.  The warning also tells you that the 
object in the global environment masks (or will take precedence over) the tab1 
column.

You then print out 'v1' column of the ozon object (which has nothing to do with 
the ozon column in tab1).

Then you do the Shapiro test, I would assume given that you show us reading in 
and attaching tab1 that you want the test done on the ozon column of tab1, but 
R finds the ozon object in the global environment before it finds the column in 
tab1 and you get the error.

Remember that computers are stupid, they do exactly what they are told to do, 
so tell R exactly what you want it to do.  Either remove the ozon object so 
that it is not found first, or use commands like:

 shapiro.test(tab1$ozon)
 with(tab1, shapiro.test(ozon))


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Stefan Scheurer
 Sent: Wednesday, May 26, 2010 1:37 PM
 To: r-help@r-project.org
 Subject: [R] shapiro.test
 
 
 Hi,
 I am not so sure about an error note I got when using shapiro.test.
 I imported some data into R by wrinting it into a .txt file via
  tab1-read.table(etctxt,header=T)
  attach(tab1)
   The following object(s) are masked _by_ .GlobalEnv :
ozon
  ozon$V1 [1] 2.5 3.0 5.6 4.7 6.5 6.7 1.7 5.3 4.6 7.4 5.4 4.1 5.1 5.6
 5.4 6.1 7.6[18] 6.2 6.0 5.5 5.8 8.2 3.1 5.8 2.6
 Now I wanted to use the shapiro.test:
  shapiro.test(ozon)
 Fehler in sort.list(x[complete.cases(x)]) :   'x' must be atomic for
 'sort.list'Have you called 'sort' on a list?
 Can anyone help please?
 Best regards
 
 _
 Hotmail: Leistungsstarke kostenlose E-Mails mit Sicherheit von
 Microsoft.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shapiro.test

2010-05-26 Thread Stefan Scheurer

Hi,
I am not so sure about an error note I got when using shapiro.test.
I imported some data into R by wrinting it into a .txt file via 
 tab1-read.table(etctxt,header=T)
 attach(tab1)
The following object(s) are masked _by_ .GlobalEnv :
 ozon 
 ozon$V1 [1] 2.5 3.0 5.6 4.7 6.5 6.7 1.7 5.3 4.6 7.4 5.4 4.1 5.1 5.6 5.4 6.1 
 7.6[18] 6.2 6.0 5.5 5.8 8.2 3.1 5.8 2.6 
Now I wanted to use the shapiro.test:
 shapiro.test(ozon)
Fehler in sort.list(x[complete.cases(x)]) :   'x' must be atomic for 
'sort.list'Have you called 'sort' on a list?
Can anyone help please? 
Best regards
  
_
Hotmail: Leistungsstarke kostenlose E-Mails mit Sicherheit von Microsoft.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test

2010-05-26 Thread David Winsemius


On May 26, 2010, at 3:36 PM, Stefan Scheurer wrote:



Hi,
I am not so sure about an error note I got when using shapiro.test.
I imported some data into R by wrinting it into a .txt file via

tab1-read.table(etctxt,header=T)
attach(tab1)

The following object(s) are masked _by_ .GlobalEnv :
 ozon
ozon$V1 [1] 2.5 3.0 5.6 4.7 6.5 6.7 1.7 5.3 4.6 7.4 5.4 4.1 5.1 5.6  
5.4 6.1 7.6[18] 6.2 6.0 5.5 5.8 8.2 3.1 5.8 2.6


This would imply that ozon is a list or dataframe.


Now I wanted to use the shapiro.test:

shapiro.test(ozon)
Fehler in sort.list(x[complete.cases(x)]) :   'x' must be atomic for  
'sort.list'Have you called 'sort' on a list?


And you tried to give the whole list to a function that only wants a  
vector.




Can anyone help please?
Best regards

_
Hotmail: Leistungsstarke kostenlose E-Mails mit Sicherheit von  
Microsoft.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Shapiro.test on data frame

2009-06-22 Thread Gonzalo Quiroga
Hi, I need help to perform a Shapiro.test on a data frame, I know that
this test works only with vector but I guess there most be a way to
permor it on a data frame instead of vactor by vector (i.e. I've got 40
variables to analyze and its kinda annoying to do it one by one)

Thanks to anyone that can help me.

 Gonzalo Quiroga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Shapiro.test on data frame

2009-06-22 Thread Henrique Dallazuanna
Try this:

x - data.frame(A = runif(10), B = rnorm(10))
lapply(x, shapiro.test)

On Mon, Jun 22, 2009 at 3:15 PM, Gonzalo Quiroga
quirogagonz...@gmail.comwrote:

 Hi, I need help to perform a Shapiro.test on a data frame, I know that
 this test works only with vector but I guess there most be a way to
 permor it on a data frame instead of vactor by vector (i.e. I've got 40
 variables to analyze and its kinda annoying to do it one by one)

 Thanks to anyone that can help me.

 Gonzalo Quiroga

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Shapiro.test on data frame

2009-06-22 Thread Dylan Beaudette
On Monday 22 June 2009, Gonzalo Quiroga wrote:
 Hi, I need help to perform a Shapiro.test on a data frame, I know that
 this test works only with vector but I guess there most be a way to
 permor it on a data frame instead of vactor by vector (i.e. I've got 40
 variables to analyze and its kinda annoying to do it one by one)

 Thanks to anyone that can help me.

  Gonzalo Quiroga

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

are you looking to perform this column-wise or row-wise?

see ?apply for ideas

cheers,
Dylan

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.