Re: [R] sciplot question
On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote: Manuel Morales wrote: On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Incorrect. Try running some simulations on highly skewed data. You will find situations where the confidence coverage is not very close of the stated level (e.g., 0.95) and more situations where the overall coverage is 0.95 because one tail area is near 0 and the other is near 0.05. The larger the sample size, the more skewness has to be present to cause this problem. OK - I'm convinced. It turns out that the first change I made to sciplot was to allow for asymmetric error bars. Is there an easy way (i.e., existing package) to bootstrap confidence intervals in R. If so, I'll try to incorporate this as an option in sciplot. library(Hmisc) ?smean.cl.boot H(arrel)misc :-) Thanks for valuable input Frank. This seems to work fine. (slightly more time consuming , but what do we have CPU power for ) library(Hmisc) library(sciplot) my.ci - function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3]) lineplot .CI (V1 ,V2 ,data = d ,col = c (4 ),err .col = c (1 ),err .width = 0.02 ,legend=FALSE,xlab=Timeofday,ylab=IOPS,ci.fun=my.ci,cex=0.5,lwd=0.7) Have I understood you correct in that this is a more accurate way of visualizing variability in any dataset , than the students T confidence intervals, because it does not assume normality ? Can you explain the meaning of B, and how to find a sensible value (if not the default is sufficient) ? Best regards Jarle Bjørgeengen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Jarle Bjørgeengen wrote: On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote: Manuel Morales wrote: On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Incorrect. Try running some simulations on highly skewed data. You will find situations where the confidence coverage is not very close of the stated level (e.g., 0.95) and more situations where the overall coverage is 0.95 because one tail area is near 0 and the other is near 0.05. The larger the sample size, the more skewness has to be present to cause this problem. OK - I'm convinced. It turns out that the first change I made to sciplot was to allow for asymmetric error bars. Is there an easy way (i.e., existing package) to bootstrap confidence intervals in R. If so, I'll try to incorporate this as an option in sciplot. library(Hmisc) ?smean.cl.boot H(arrel)misc :-) Thanks for valuable input Frank. This seems to work fine. (slightly more time consuming , but what do we have CPU power for ) library(Hmisc) library(sciplot) my.ci - function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3]) Don't double the executing time by running it twice! And this way you might possibly get an upper confidence interval that is lower than the lower one. Do function(x) smean.cl.boot(x)[-1] lineplot.CI(V1,V2,data=d,col=c(4),err.col=c(1),err.width=0.02,legend=FALSE,xlab=Timeofday,ylab=IOPS,ci.fun=my.ci,cex=0.5,lwd=0.7) Have I understood you correct in that this is a more accurate way of visualizing variability in any dataset , than the students T confidence intervals, because it does not assume normality ? Yes but instead of saying variability (which quantiles are good at) we are talking about the precision of the mean. Can you explain the meaning of B, and how to find a sensible value (if not the default is sufficient) ? For most purposes the default is sufficient. There are great books and papers on the bootstrap for more info, including improved variations on the simple bootstrap percentile confidence interval used here. Frank Best regards Jarle Bjørgeengen -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
On May 26, 2009, at 3:02 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote: Manuel Morales wrote: On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Incorrect. Try running some simulations on highly skewed data. You will find situations where the confidence coverage is not very close of the stated level (e.g., 0.95) and more situations where the overall coverage is 0.95 because one tail area is near 0 and the other is near 0.05. The larger the sample size, the more skewness has to be present to cause this problem. OK - I'm convinced. It turns out that the first change I made to sciplot was to allow for asymmetric error bars. Is there an easy way (i.e., existing package) to bootstrap confidence intervals in R. If so, I'll try to incorporate this as an option in sciplot. library(Hmisc) ?smean.cl.boot H(arrel)misc :-) Thanks for valuable input Frank. This seems to work fine. (slightly more time consuming , but what do we have CPU power for ) library(Hmisc) library(sciplot) my.ci - function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3]) Don't double the executing time by running it twice! And this way you might possibly get an upper confidence interval that is lower than the lower one. Do function(x) smean.cl.boot(x)[-1] D'oh lineplot .CI (V1 ,V2 ,data = d ,col = c (4 ),err .col = c (1 ),err .width = 0.02 ,legend =FALSE,xlab=Timeofday,ylab=IOPS,ci.fun=my.ci,cex=0.5,lwd=0.7) Have I understood you correct in that this is a more accurate way of visualizing variability in any dataset , than the students T confidence intervals, because it does not assume normality ? Yes but instead of saying variability (which quantiles are good at) we are talking about the precision of the mean. Right. Can you explain the meaning of B, and how to find a sensible value (if not the default is sufficient) ? For most purposes the default is sufficient. There are great books and papers on the bootstrap for more info, including improved variations on the simple bootstrap percentile confidence interval used here. Frank Once again, thanks. Best regards - Jarle Bjørgeengen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Best rgds Jarle Bjørgeengen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Jarle Bjørgeengen wrote: On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Incorrect. Try running some simulations on highly skewed data. You will find situations where the confidence coverage is not very close of the stated level (e.g., 0.95) and more situations where the overall coverage is 0.95 because one tail area is near 0 and the other is near 0.05. The larger the sample size, the more skewness has to be present to cause this problem. Frank Best rgds Jarle Bjørgeengen -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Frank E Harrell Jr wrote: spencerg wrote: Dear Frank, et al.: Frank E Harrell Jr wrote: snip Yes; I do see a normal distribution about once every 10 years. To what do you attribute the nonnormality you see in most cases? (1) Unmodeled components of variance that can generate errors in interpretation if ignored, even with bootstrapping? (2) Honest outliers that do not relate to the phenomena of interest and would better be removed through improved checks on data quality, but where bootstrapping is appropriate (provided the data are not also contaminated with (1))? (3) Situations where the physical application dictates a different distribution such as binomial, lognormal, gamma, etc., possibly also contaminated with (1) and (2)? I've fit mixtures of normals to data before, but one needs to be careful about not carrying that to extremes, as the mixture may be a result of (1) and therefore not replicable. George Box once remarked that he thought most designed experiments included split plotting that had been ignored in the analysis. That is only a special case of (1). Thanks, Spencer Graves Spencer, Those are all important reasons for non-normality of margin distributions. But the biggest reason of all is that the underlying process did not know about the normal distribution. Normality in raw data is usually an accident. Frank: Might there be a difference between the physical and social sciences on this issue? The central limit effect works pretty well with many kinds of manufacturing data, except that it is often masked by between-lot components of variance. The first differences in log(prices) are often long-tailed and negatively skewed. Standard GARCH and similar models handle the long tails well but miss the skewness, at least in what I've seen. I think that can be fixed, but I have not yet seen it done. Social science data, however, often involve discrete scales where the raters' interpretations of the scales rarely match any standard distribution. Transforming to latent variables, e.g., via factor analysis, may help but do not eliminate the problem. Thanks for your comments. Spencer Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
spencerg wrote: Frank E Harrell Jr wrote: spencerg wrote: Dear Frank, et al.: Frank E Harrell Jr wrote: snip Yes; I do see a normal distribution about once every 10 years. To what do you attribute the nonnormality you see in most cases? (1) Unmodeled components of variance that can generate errors in interpretation if ignored, even with bootstrapping? (2) Honest outliers that do not relate to the phenomena of interest and would better be removed through improved checks on data quality, but where bootstrapping is appropriate (provided the data are not also contaminated with (1))? (3) Situations where the physical application dictates a different distribution such as binomial, lognormal, gamma, etc., possibly also contaminated with (1) and (2)? I've fit mixtures of normals to data before, but one needs to be careful about not carrying that to extremes, as the mixture may be a result of (1) and therefore not replicable. George Box once remarked that he thought most designed experiments included split plotting that had been ignored in the analysis. That is only a special case of (1). Thanks, Spencer Graves Spencer, Those are all important reasons for non-normality of margin distributions. But the biggest reason of all is that the underlying process did not know about the normal distribution. Normality in raw data is usually an accident. Frank: Might there be a difference between the physical and social sciences on this issue? Hi Spencer, I doubt that the difference is large, but biological measurements seem to be more of a problem. The central limit effect works pretty well with many kinds of manufacturing data, except that it is often masked by between-lot components of variance. The first differences in log(prices) are often long-tailed and negatively skewed. Standard GARCH and similar models handle the long tails well but miss the skewness, at least in what I've seen. I think that can be fixed, but I have not yet seen it done. The central limit theorem in and of itself doesn't help because it doesn't tell you how large N must be before normality holds well enough. Social science data, however, often involve discrete scales where the raters' interpretations of the scales rarely match any standard distribution. Transforming to latent variables, e.g., via factor analysis, may help but do not eliminate the problem. Good example. Many of the scales I've seen are non-normal or even multi-modal. Thanks for your comments. Thanks for yours Frank Spencer Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Incorrect. Try running some simulations on highly skewed data. You will find situations where the confidence coverage is not very close of the stated level (e.g., 0.95) and more situations where the overall coverage is 0.95 because one tail area is near 0 and the other is near 0.05. The larger the sample size, the more skewness has to be present to cause this problem. OK - I'm convinced. It turns out that the first change I made to sciplot was to allow for asymmetric error bars. Is there an easy way (i.e., existing package) to bootstrap confidence intervals in R. If so, I'll try to incorporate this as an option in sciplot. BTW Jarle - to answer an earlier question, standard error is the standard in my field, ecology, and that's why it's the current default in sciplot. Manuel Frank Best rgds Jarle Bjørgeengen -- http://mutualism.williams.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Manuel Morales wrote: On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. Is it not true that the students-T (qt(... and so on) confidence intervals is quite robust against non-normality too ? A teacher told me that, the students-T symmetric confidence intervals will give a adequate picture of the variability of the data in this particular case. Incorrect. Try running some simulations on highly skewed data. You will find situations where the confidence coverage is not very close of the stated level (e.g., 0.95) and more situations where the overall coverage is 0.95 because one tail area is near 0 and the other is near 0.05. The larger the sample size, the more skewness has to be present to cause this problem. OK - I'm convinced. It turns out that the first change I made to sciplot was to allow for asymmetric error bars. Is there an easy way (i.e., existing package) to bootstrap confidence intervals in R. If so, I'll try to incorporate this as an option in sciplot. library(Hmisc) ?smean.cl.boot BTW Jarle - to answer an earlier question, standard error is the standard in my field, ecology, and that's why it's the current default in sciplot. Too bad. Frank Manuel Frank Best rgds Jarle Bjørgeengen -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) lineplot.CI(x.factor = dose, response = len, data = ToothGrowth, ci.fun=my.ci) Manuel On Fri, 2009-05-22 at 18:38 +0200, Jarle Bjørgeengen wrote: Hi, I would like to have lineplot.CI and barplot.CI to actually plot confidence intervals , instead of standard error. I understand I have to use the ci.fun option, but I'm not quite sure how. Like this : qt(0.975,df=n-1)*s/sqrt(n) but how can I apply it to visualize the length of the student's T confidence intervals rather than the stdandard error of the plotted means ? -- http://mutualism.williams.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) lineplot.CI(x.factor = dose, response = len, data = ToothGrowth, ci.fun=my.ci) Manuel On Fri, 2009-05-22 at 18:38 +0200, Jarle Bjørgeengen wrote: Hi, I would like to have lineplot.CI and barplot.CI to actually plot confidence intervals , instead of standard error. I understand I have to use the ci.fun option, but I'm not quite sure how. Like this : qt(0.975,df=n-1)*s/sqrt(n) but how can I apply it to visualize the length of the student's T confidence intervals rather than the stdandard error of the plotted means ? -- http://mutualism.williams.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). I'm not sure how NAs are handled. Frank lineplot.CI(x.factor = dose, response = len, data = ToothGrowth, ci.fun=my.ci) Manuel On Fri, 2009-05-22 at 18:38 +0200, Jarle Bjørgeengen wrote: Hi, I would like to have lineplot.CI and barplot.CI to actually plot confidence intervals , instead of standard error. I understand I have to use the ci.fun option, but I'm not quite sure how. Like this : qt(0.975,df=n-1)*s/sqrt(n) but how can I apply it to visualize the length of the student's T confidence intervals rather than the stdandard error of the plotted means ? -- http://mutualism.williams.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? When plotting the individual sample , it looks normally distributed. Best regards. Jarle Bjørgeengen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Jarle Bjørgeengen wrote: On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote: Jarle Bjørgeengen wrote: Great, thanks Manuel. Just for curiosity, any particular reason you chose standard error , and not confidence interval as the default (the naming of the plotting functions associates closer to the confidence interval ) error indication . - Jarle Bjørgeengen On May 24, 2009, at 3:02 , Manuel Morales wrote: You define your own function for the confidence intervals. The function needs to return the two values representing the upper and lower CI values. So: qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x)) my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x)) Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence limits should be asymmetric (a la bootstrap). Thanks, if the date is normally distributed , symmetric confidence interval should be ok , right ? Yes; I do see a normal distribution about once every 10 years. When plotting the individual sample , it looks normally distributed. An appropriate qqnorm plot is a better way to check, but often the data cannot tell you about the normality of themselves. It's usually better to use methods (e.g., bootstrap) that do not assume normality and that provide skewed confidence intervals if the data are skewed. Frank Best regards. Jarle Bjørgeengen -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
Dear Frank, et al.: Frank E Harrell Jr wrote: snip Yes; I do see a normal distribution about once every 10 years. To what do you attribute the nonnormality you see in most cases? (1) Unmodeled components of variance that can generate errors in interpretation if ignored, even with bootstrapping? (2) Honest outliers that do not relate to the phenomena of interest and would better be removed through improved checks on data quality, but where bootstrapping is appropriate (provided the data are not also contaminated with (1))? (3) Situations where the physical application dictates a different distribution such as binomial, lognormal, gamma, etc., possibly also contaminated with (1) and (2)? I've fit mixtures of normals to data before, but one needs to be careful about not carrying that to extremes, as the mixture may be a result of (1) and therefore not replicable. George Box once remarked that he thought most designed experiments included split plotting that had been ignored in the analysis. That is only a special case of (1). Thanks, Spencer Graves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sciplot question
spencerg wrote: Dear Frank, et al.: Frank E Harrell Jr wrote: snip Yes; I do see a normal distribution about once every 10 years. To what do you attribute the nonnormality you see in most cases? (1) Unmodeled components of variance that can generate errors in interpretation if ignored, even with bootstrapping? (2) Honest outliers that do not relate to the phenomena of interest and would better be removed through improved checks on data quality, but where bootstrapping is appropriate (provided the data are not also contaminated with (1))? (3) Situations where the physical application dictates a different distribution such as binomial, lognormal, gamma, etc., possibly also contaminated with (1) and (2)? I've fit mixtures of normals to data before, but one needs to be careful about not carrying that to extremes, as the mixture may be a result of (1) and therefore not replicable. George Box once remarked that he thought most designed experiments included split plotting that had been ignored in the analysis. That is only a special case of (1). Thanks, Spencer Graves Spencer, Those are all important reasons for non-normality of margin distributions. But the biggest reason of all is that the underlying process did not know about the normal distribution. Normality in raw data is usually an accident. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sciplot question
Hi, I would like to have lineplot.CI and barplot.CI to actually plot confidence intervals , instead of standard error. I understand I have to use the ci.fun option, but I'm not quite sure how. Like this : qt(0.975,df=n-1)*s/sqrt(n) but how can I apply it to visualize the length of the student's T confidence intervals rather than the stdandard error of the plotted means ? -- ~~ Best regards .~. Jarle Bjørgeengen /V\ Mob: +47 9155 7978// \\ http://www.uio.no/sok?person=jb /( )\ while(){if(s/^(.*\?)$/42 !/){print $1 $_}}^`~'^ ~~ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.