Re: [R] shapiro.test
Regarding : ... I don't know what the 4th to last page would be called (could add another ante-, or in R just use tail(book,4))... According to wordsmith.org (sign up it's free, note I have no affiliation to that site) the word is preantepenultimate. Check it out: http://wordsmith.org/words/preantepenultimate.html Yes it's off-topic. Please write your congressman. KW -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Philippe, replies inline On Sat, Feb 22, 2014 at 12:29 AM, Philippe Grosjean phgrosj...@sciviews.org wrote: Greg, I really like that TeachingDemos::SnowsPenultimateNormalityTest()... If you like that function then you may appreciate TeachingDemos::SnowsCorrectlySizedButOtherwiseUselessTestOfAnything, which I suspect (but have been to lazy to check) may be the longest exported function name in a CRAN package. I justify the names of these 2 functions using the same logic that suggests short and simple names for functions that you would expect to be used often. even the tortuous way to always return a p-value == 0: It turns out (discovered by accident and then brought to my attention) that if you run SnowsPenultimateNormalityTest on a vector of length 0 then it does return a p-value of 1. I have not yet decided if this is a bug or a feature. On one hand it makes sense that a sample of size 0 is perfectly consistent with the assumption that you chose 0 observations from a normal distribution, on the other hand, if it is an integer or double vector of length 0 that would still be information that the numbers (or lack thereof) are rational. [snip] I am just curious... Are there teachers out there pointing to that test? If yes, what fraction of the students realise what happens? I guess, it is closer to zero than to one, unfortunately. Wait... I need another SnowsPenultimateXxxxTest() here to check the null hypothesis that all my students are doing what they are supposed to do when discovering a new statistical tool! I don't know of any teachers pointing to the test, I would want to be careful which class to bring it up in. For some students it could result in an epiphany, others may just blindly use it, and still others may have their heads explode if they have to think to hard about it. I was originally considering naming the test SnowsAntepenultimeateTest to give a little more room for follow-up tests, but at the time I could not remember if it was Ante (before) or Anti (opposite). I learned the word Antepenultimate in terms of pages in a book, where the 3rd to last page (the Antepenultimate page) is directly opposite (Anti-) the Penultimate page. Just in case that is not confusing enough, the ultimate page of a cheap detective novel is the last page where the hero realizes that since the motive for the murder was to cover up the murderer's embezzlement of the family fortune to pay off his bookie, the hero will not be paid after all and will still need to continue avoiding his loan shark. The penultimate page is the second to last page where in response to the hero's listing of circumstantial evidence the murderer conveniently confesses and fills in all the missing details saving the embarrassment to the hero if he had just lawyer-ed up and been acquitted due to lack of hard evidence. And the antepenultimate page is the 3rd to last where the hero utters the cliche phrase You are probably wondering why I gathered you all here. I don't know what the 4th to last page would be called (could add another ante-, or in R just use tail(book,4)). -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Greg, For some authors the 4th page from the back should be the first page. Not so for you, however. Clint Clint BowmanINTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600FAX:(360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels:300 Desmond Drive, Lacey, WA 98503-1274 On Mon, 24 Feb 2014, Greg Snow wrote: Philippe, replies inline On Sat, Feb 22, 2014 at 12:29 AM, Philippe Grosjean phgrosj...@sciviews.org wrote: Greg, I really like that TeachingDemos::SnowsPenultimateNormalityTest()... If you like that function then you may appreciate TeachingDemos::SnowsCorrectlySizedButOtherwiseUselessTestOfAnything, which I suspect (but have been to lazy to check) may be the longest exported function name in a CRAN package. I justify the names of these 2 functions using the same logic that suggests short and simple names for functions that you would expect to be used often. even the tortuous way to always return a p-value == 0: It turns out (discovered by accident and then brought to my attention) that if you run SnowsPenultimateNormalityTest on a vector of length 0 then it does return a p-value of 1. I have not yet decided if this is a bug or a feature. On one hand it makes sense that a sample of size 0 is perfectly consistent with the assumption that you chose 0 observations from a normal distribution, on the other hand, if it is an integer or double vector of length 0 that would still be information that the numbers (or lack thereof) are rational. [snip] I am just curious... Are there teachers out there pointing to that test? If yes, what fraction of the students realise what happens? I guess, it is closer to zero than to one, unfortunately. Wait... I need another SnowsPenultimateXxxxTest() here to check the null hypothesis that all my students are doing what they are supposed to do when discovering a new statistical tool! I don't know of any teachers pointing to the test, I would want to be careful which class to bring it up in. For some students it could result in an epiphany, others may just blindly use it, and still others may have their heads explode if they have to think to hard about it. I was originally considering naming the test SnowsAntepenultimeateTest to give a little more room for follow-up tests, but at the time I could not remember if it was Ante (before) or Anti (opposite). I learned the word Antepenultimate in terms of pages in a book, where the 3rd to last page (the Antepenultimate page) is directly opposite (Anti-) the Penultimate page. Just in case that is not confusing enough, the ultimate page of a cheap detective novel is the last page where the hero realizes that since the motive for the murder was to cover up the murderer's embezzlement of the family fortune to pay off his bookie, the hero will not be paid after all and will still need to continue avoiding his loan shark. The penultimate page is the second to last page where in response to the hero's listing of circumstantial evidence the murderer conveniently confesses and fills in all the missing details saving the embarrassment to the hero if he had just lawyer-ed up and been acquitted due to lack of hard evidence. And the antepenultimate page is the 3rd to last where the hero utters the cliche phrase You are probably wondering why I gathered you all here. I don't know what the 4th to last page would be called (could add another ante-, or in R just use tail(book,4)). -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Hello, Inline Em 21-02-2014 23:13, Rolf Turner escreveu: On 22/02/14 11:04, Rui Barradas wrote: Hello, Not answering directly to your question, if the sample size is a documented problem with shapiro.test and you want a normality test, why don't you use ?ks.test? m - mean(HP_TrinityK25$V2) s - sd(HP_TrinityK25$V2) ks.test(HP_TrinityK25$V2, pnorm, m, s) Strictly speaking this is not a valid test. The KS test is used for testing against a *completely specified* distribution. If there are parameters to be estimated, the null distribution is no longer applicable. This may not be a real problem if the parameters are *well* estimated, as they would be in this instance (given that the sample size is over-large). I'm not sure about this. Yes, you're right. I hesitated before posting my answer precisely because of this, the parameters must be pre-determined constants, not computed from the data. Like Greg pointed out in his reply, the help page for ?ks.test also explicitly refers to it (which I had missed). The chi-squared gof test seems to be a good choice, given the sample size. Rui Barradas The Lilliefors test is theoretically available in this context when mu and sigma are estimated, but according to the Wikipedia article, the Lilliefors distribution is not known analytically and the critical values must be determined by Monte Carlo methods. There is a LillieTest function in the DescTools package which makes use of some approximations to get p-values. However I think that a better approach would be to use a chi-squared goodness of fit test whereby you can adjust for estimated parameters simply by reducing the degrees of freedom. I believe that the chi-squared test is somewhat low in power, but with a very large sample this should not be a problem. The difficulty with the chi-squared test is that the choice of bins is somewhat arbitrary. I believe the best approach is to take the bin boundaries to be the quantiles of the normal distribution (with parameters m and s) corresponding to equispaced probabilities on [0,1], with the number of such probabilities being k+1 where k = floor(n/5), n being the sample size. This makes the expected counts all equal to n/k = 5 so that the chi-squared test is valid. The degrees of freedom are then k-3 (k - 1 - #estimated parameters). One last comment: I believe that it is generally considered that testing for normality is a waste of time and a pseudo-intellectual exercise of academic interest at best. cheers, Rolf Turner Hope this helps, Rui Barradas Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu: Dear R users, Please help with with this maybe basic question. I am trying to see if my data is normal but is a large file and the test does not work. I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 thanks! shapiro.test(x=HP_TrinityK25$V2) Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 ##Note: HP_TrinityK25= my file HP_TrinityK25$V2= data in my file [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Second. Rui Barradas Em 21-02-2014 23:44, Rolf Turner escreveu: On 22/02/14 11:53, Greg Snow wrote: SNIP Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question). SNIP Fortune!!! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] shapiro.test
Dear R users, Please help with with this maybe basic question. I am trying to see if my data is normal but is a large file and the test does not work. I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 thanks! shapiro.test(x=HP_TrinityK25$V2) Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 ##Note: HP_TrinityK25= my file HP_TrinityK25$V2= data in my file [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Hello, Not answering directly to your question, if the sample size is a documented problem with shapiro.test and you want a normality test, why don't you use ?ks.test? m - mean(HP_TrinityK25$V2) s - sd(HP_TrinityK25$V2) ks.test(HP_TrinityK25$V2, pnorm, m, s) Hope this helps, Rui Barradas Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu: Dear R users, Please help with with this maybe basic question. I am trying to see if my data is normal but is a large file and the test does not work. I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 thanks! shapiro.test(x=HP_TrinityK25$V2) Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 ##Note: HP_TrinityK25= my file HP_TrinityK25$V2= data in my file [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Rui, Note this quote from the last paragraph of the Details section of ?ks.test: If a single-sample test is used, the parameters specified in '...' must be pre-specified and not estimated from the data. Which is the exact opposite of your example. Gonzalo, Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question). If you really feel the need for a p-value then SnowsPenultimateNormalityTest in the TeachingDemos package will work for large sample sizes. But note that the documentation for that function is considered more useful than the function itself. On Fri, Feb 21, 2014 at 3:04 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Not answering directly to your question, if the sample size is a documented problem with shapiro.test and you want a normality test, why don't you use ?ks.test? m - mean(HP_TrinityK25$V2) s - sd(HP_TrinityK25$V2) ks.test(HP_TrinityK25$V2, pnorm, m, s) Hope this helps, Rui Barradas Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu: Dear R users, Please help with with this maybe basic question. I am trying to see if my data is normal but is a large file and the test does not work. I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 thanks! shapiro.test(x=HP_TrinityK25$V2) Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 ##Note: HP_TrinityK25= my file HP_TrinityK25$V2= data in my file [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
On 22/02/14 11:04, Rui Barradas wrote: Hello, Not answering directly to your question, if the sample size is a documented problem with shapiro.test and you want a normality test, why don't you use ?ks.test? m - mean(HP_TrinityK25$V2) s - sd(HP_TrinityK25$V2) ks.test(HP_TrinityK25$V2, pnorm, m, s) Strictly speaking this is not a valid test. The KS test is used for testing against a *completely specified* distribution. If there are parameters to be estimated, the null distribution is no longer applicable. This may not be a real problem if the parameters are *well* estimated, as they would be in this instance (given that the sample size is over-large). I'm not sure about this. The Lilliefors test is theoretically available in this context when mu and sigma are estimated, but according to the Wikipedia article, the Lilliefors distribution is not known analytically and the critical values must be determined by Monte Carlo methods. There is a LillieTest function in the DescTools package which makes use of some approximations to get p-values. However I think that a better approach would be to use a chi-squared goodness of fit test whereby you can adjust for estimated parameters simply by reducing the degrees of freedom. I believe that the chi-squared test is somewhat low in power, but with a very large sample this should not be a problem. The difficulty with the chi-squared test is that the choice of bins is somewhat arbitrary. I believe the best approach is to take the bin boundaries to be the quantiles of the normal distribution (with parameters m and s) corresponding to equispaced probabilities on [0,1], with the number of such probabilities being k+1 where k = floor(n/5), n being the sample size. This makes the expected counts all equal to n/k = 5 so that the chi-squared test is valid. The degrees of freedom are then k-3 (k - 1 - #estimated parameters). One last comment: I believe that it is generally considered that testing for normality is a waste of time and a pseudo-intellectual exercise of academic interest at best. cheers, Rolf Turner Hope this helps, Rui Barradas Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu: Dear R users, Please help with with this maybe basic question. I am trying to see if my data is normal but is a large file and the test does not work. I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 thanks! shapiro.test(x=HP_TrinityK25$V2) Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 ##Note: HP_TrinityK25= my file HP_TrinityK25$V2= data in my file [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
On 22/02/14 11:53, Greg Snow wrote: SNIP Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question). SNIP Fortune!!! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Second!! -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Fri, Feb 21, 2014 at 3:44 PM, Rolf Turner r.tur...@auckland.ac.nz wrote: On 22/02/14 11:53, Greg Snow wrote: SNIP Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question). SNIP Fortune!!! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Greg, I really like that TeachingDemos::SnowsPenultimateNormalityTest()… even the tortuous way to always return a p-value == 0: # the following function works for current implementations of R # to my knowledge, eventually it may need to be expanded is.rational - function(x){ rep( TRUE, length(x) ) } tmp.p - if( any(is.rational(x))) { 0 } else { # current implementation will not get here if length # of x is positive. This part is reserved for the # ultimate test 1 } (p.value is then returned as tmp.p). Also, the nice and sexy printing of that p-value in R as: p-value 2.2e-16 which looks much more serious than 'p-value = 0'… Here you has nothing to do. The stats::format.pval() function called from stats:::print.htest() already does the job for you! I am just curious… Are there teachers out there pointing to that test? If yes, what fraction of the students realise what happens? I guess, it is closer to zero than to one, unfortunately. Wait… I need another SnowsPenultimateXxxxTest() here to check the null hypothesis that all my students are doing what they are supposed to do when discovering a new statistical tool! Best, Philippe Grosjean On 21 Feb 2014, at 23:53, Greg Snow 538...@gmail.com wrote: Rui, Note this quote from the last paragraph of the Details section of ?ks.test: If a single-sample test is used, the parameters specified in '...' must be pre-specified and not estimated from the data. Which is the exact opposite of your example. Gonzalo, Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question). If you really feel the need for a p-value then SnowsPenultimateNormalityTest in the TeachingDemos package will work for large sample sizes. But note that the documentation for that function is considered more useful than the function itself. On Fri, Feb 21, 2014 at 3:04 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Not answering directly to your question, if the sample size is a documented problem with shapiro.test and you want a normality test, why don't you use ?ks.test? m - mean(HP_TrinityK25$V2) s - sd(HP_TrinityK25$V2) ks.test(HP_TrinityK25$V2, pnorm, m, s) Hope this helps, Rui Barradas Em 21-02-2014 15:59, Gonzalo Villarino Pizarro escreveu: Dear R users, Please help with with this maybe basic question. I am trying to see if my data is normal but is a large file and the test does not work. I keep getting the message : Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 thanks! shapiro.test(x=HP_TrinityK25$V2) Error in shapiro.test(x = HP_TrinityK25$V2) : sample size must be between 3 and 5000 ##Note: HP_TrinityK25= my file HP_TrinityK25$V2= data in my file [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] shapiro.test()
Hey, today I wanted to use the shapiro.test() on data containing 3 numerical values per group. It is the first time that an NA was given back for some of the groups. In the follwing an example of code and output is shown: shapiro.test(c(0.000637806, 0.00175561, 0.001196708)) Shapiro-Wilk normality test data: c(0.000637806, 0.00175561, 0.001196708) W = 1, p-value = NA I am not able to find the bug in our data, so I think there might be a problem with the shapiro.test(). I use the following technical background: platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 14.1 year 2011 month 12 day22 svn rev57956 language R version.string R version 2.14.1 (2011-12-22) Thanks, Judith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test()
See ?shapiro.test ...the number of non-missing values must be between 3 and 5000. By the way, how reasonable testing normality of 3 values? Best ozgur -- View this message in context: http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634520.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test()
Actually, your sample size is 3. Sorry for that. Ozgur -- View this message in context: http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634525.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test()
On Jun 26, 2012, at 16:43 , r...@uni-potsdam.de wrote: Hey, today I wanted to use the shapiro.test() on data containing 3 numerical values per group. It is the first time that an NA was given back for some of the groups. In the follwing an example of code and output is shown: shapiro.test(c(0.000637806, 0.00175561, 0.001196708)) Shapiro-Wilk normality test data: c(0.000637806, 0.00175561, 0.001196708) W = 1, p-value = NA I am not able to find the bug in our data, so I think there might be a problem with the shapiro.test(). The clue is that diff(sort(c(0.000637806, 0.00175561, 0.001196708))) [1] 0.000558902 0.000558902 which is either an extreme coincidence or a sign that your data are not independent samples from a continuous distribution. Since the normal quantiles are also equidistant, you get a correlation of W=1 in the QQ-plot, and apparently this triggers the NA p-value. I suppose returning p=1.0 would arguably be a better choice for this case, but it _is_ pretty extreme. -pd I use the following technical background: platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 14.1 year 2011 month 12 day22 svn rev57956 language R version.string R version 2.14.1 (2011-12-22) Thanks, Judith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Hi, David Winsemius schrieb: snip This would imply that ozon is a list or dataframe. snip And you tried to give the whole list to a function that only wants a vector. And whenever you suspect that your data types clash, try str() to find out just what kind of thing your data is. Here: str(ozon) HTH, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Others pointed out that the error message is due to ozon being a data frame, but I think the true source of confusion comes a bit earlier. You really need to understand more about data objects and the search path. You first read in a table and name it tab1. Then you attach tab1 to the search path (there are better ways than attach now, while it can be a useful tool, it can also easily lead to problems like you are seeing). The warning from attach tells you that there are now 2 things in the search path with the name ozon, one of which is an object in the global environment, the other is one of the columns of tab1. The warning also tells you that the object in the global environment masks (or will take precedence over) the tab1 column. You then print out 'v1' column of the ozon object (which has nothing to do with the ozon column in tab1). Then you do the Shapiro test, I would assume given that you show us reading in and attaching tab1 that you want the test done on the ozon column of tab1, but R finds the ozon object in the global environment before it finds the column in tab1 and you get the error. Remember that computers are stupid, they do exactly what they are told to do, so tell R exactly what you want it to do. Either remove the ozon object so that it is not found first, or use commands like: shapiro.test(tab1$ozon) with(tab1, shapiro.test(ozon)) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Stefan Scheurer Sent: Wednesday, May 26, 2010 1:37 PM To: r-help@r-project.org Subject: [R] shapiro.test Hi, I am not so sure about an error note I got when using shapiro.test. I imported some data into R by wrinting it into a .txt file via tab1-read.table(etctxt,header=T) attach(tab1) The following object(s) are masked _by_ .GlobalEnv : ozon ozon$V1 [1] 2.5 3.0 5.6 4.7 6.5 6.7 1.7 5.3 4.6 7.4 5.4 4.1 5.1 5.6 5.4 6.1 7.6[18] 6.2 6.0 5.5 5.8 8.2 3.1 5.8 2.6 Now I wanted to use the shapiro.test: shapiro.test(ozon) Fehler in sort.list(x[complete.cases(x)]) : 'x' must be atomic for 'sort.list'Have you called 'sort' on a list? Can anyone help please? Best regards _ Hotmail: Leistungsstarke kostenlose E-Mails mit Sicherheit von Microsoft. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] shapiro.test
Hi, I am not so sure about an error note I got when using shapiro.test. I imported some data into R by wrinting it into a .txt file via tab1-read.table(etctxt,header=T) attach(tab1) The following object(s) are masked _by_ .GlobalEnv : ozon ozon$V1 [1] 2.5 3.0 5.6 4.7 6.5 6.7 1.7 5.3 4.6 7.4 5.4 4.1 5.1 5.6 5.4 6.1 7.6[18] 6.2 6.0 5.5 5.8 8.2 3.1 5.8 2.6 Now I wanted to use the shapiro.test: shapiro.test(ozon) Fehler in sort.list(x[complete.cases(x)]) : 'x' must be atomic for 'sort.list'Have you called 'sort' on a list? Can anyone help please? Best regards _ Hotmail: Leistungsstarke kostenlose E-Mails mit Sicherheit von Microsoft. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
On May 26, 2010, at 3:36 PM, Stefan Scheurer wrote: Hi, I am not so sure about an error note I got when using shapiro.test. I imported some data into R by wrinting it into a .txt file via tab1-read.table(etctxt,header=T) attach(tab1) The following object(s) are masked _by_ .GlobalEnv : ozon ozon$V1 [1] 2.5 3.0 5.6 4.7 6.5 6.7 1.7 5.3 4.6 7.4 5.4 4.1 5.1 5.6 5.4 6.1 7.6[18] 6.2 6.0 5.5 5.8 8.2 3.1 5.8 2.6 This would imply that ozon is a list or dataframe. Now I wanted to use the shapiro.test: shapiro.test(ozon) Fehler in sort.list(x[complete.cases(x)]) : 'x' must be atomic for 'sort.list'Have you called 'sort' on a list? And you tried to give the whole list to a function that only wants a vector. Can anyone help please? Best regards _ Hotmail: Leistungsstarke kostenlose E-Mails mit Sicherheit von Microsoft. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Shapiro.test on data frame
Hi, I need help to perform a Shapiro.test on a data frame, I know that this test works only with vector but I guess there most be a way to permor it on a data frame instead of vactor by vector (i.e. I've got 40 variables to analyze and its kinda annoying to do it one by one) Thanks to anyone that can help me. Gonzalo Quiroga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Shapiro.test on data frame
Try this: x - data.frame(A = runif(10), B = rnorm(10)) lapply(x, shapiro.test) On Mon, Jun 22, 2009 at 3:15 PM, Gonzalo Quiroga quirogagonz...@gmail.comwrote: Hi, I need help to perform a Shapiro.test on a data frame, I know that this test works only with vector but I guess there most be a way to permor it on a data frame instead of vactor by vector (i.e. I've got 40 variables to analyze and its kinda annoying to do it one by one) Thanks to anyone that can help me. Gonzalo Quiroga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Shapiro.test on data frame
On Monday 22 June 2009, Gonzalo Quiroga wrote: Hi, I need help to perform a Shapiro.test on a data frame, I know that this test works only with vector but I guess there most be a way to permor it on a data frame instead of vactor by vector (i.e. I've got 40 variables to analyze and its kinda annoying to do it one by one) Thanks to anyone that can help me. Gonzalo Quiroga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. are you looking to perform this column-wise or row-wise? see ?apply for ideas cheers, Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.