Re: [R] sciplot question

2009-05-26 Thread Jarle Bjørgeengen


On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:


Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard  
error , and not confidence interval as the default (the  
naming of the plotting functions associates closer to the  
confidence interval  ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals.  
The function
needs to return the two values representing the upper and  
lower CI

values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ 
sqrt(length(x))

my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))
Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general  
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence  
interval should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.
Is it not true that the students-T (qt(... and so on) confidence  
intervals is quite robust against non-normality too ?


A teacher told me that, the students-T symmetric confidence  
intervals will give a adequate picture of the variability of the  
data in this particular case.
Incorrect.  Try running some simulations on highly skewed data.   
You will find situations where the confidence coverage is not very  
close of the stated level (e.g., 0.95) and more situations where  
the overall coverage is 0.95 because one tail area is near 0 and  
the other is near 0.05.


The larger the sample size, the more skewness has to be present to  
cause this problem.
OK - I'm convinced. It turns out that the first change I made to  
sciplot

was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.


library(Hmisc)
?smean.cl.boot



H(arrel)misc :-)

Thanks for valuable input Frank.

This seems to work fine. (slightly more time consuming , but what do  
we have CPU power for )


library(Hmisc)
library(sciplot)
my.ci - function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])

lineplot 
.CI 
(V1 
,V2 
,data 
= 
d 
,col 
= 
c 
(4 
),err 
.col 
= 
c 
(1 
),err 
.width 
= 
0.02 
,legend=FALSE,xlab=Timeofday,ylab=IOPS,ci.fun=my.ci,cex=0.5,lwd=0.7)


Have I understood you correct in that this is a more accurate way of  
visualizing variability in any dataset , than the students T  
confidence intervals, because it does not assume normality  ?


Can you explain the meaning of B, and how to find a sensible value (if  
not the default is sufficient) ?


Best regards
Jarle Bjørgeengen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-26 Thread Frank E Harrell Jr

Jarle Bjørgeengen wrote:


On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:


Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard 
error , and not confidence interval as the default (the naming 
of the plotting functions associates closer to the confidence 
interval  ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals. The 
function
needs to return the two values representing the upper and 
lower CI

values. So:

qt.fun - function(x) 
qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))

my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))
Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence 
interval should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.
Is it not true that the students-T (qt(... and so on) confidence 
intervals is quite robust against non-normality too ?


A teacher told me that, the students-T symmetric confidence 
intervals will give a adequate picture of the variability of the 
data in this particular case.
Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close 
of the stated level (e.g., 0.95) and more situations where the 
overall coverage is 0.95 because one tail area is near 0 and the 
other is near 0.05.


The larger the sample size, the more skewness has to be present to 
cause this problem.

OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.


library(Hmisc)
?smean.cl.boot



H(arrel)misc :-)

Thanks for valuable input Frank.

This seems to work fine. (slightly more time consuming , but what do we 
have CPU power for )


library(Hmisc)
library(sciplot)
my.ci - function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])


Don't double the executing time by running it twice!  And this way you 
might possibly get an upper confidence interval that is lower than the 
lower one.  Do function(x) smean.cl.boot(x)[-1]




lineplot.CI(V1,V2,data=d,col=c(4),err.col=c(1),err.width=0.02,legend=FALSE,xlab=Timeofday,ylab=IOPS,ci.fun=my.ci,cex=0.5,lwd=0.7) 



Have I understood you correct in that this is a more accurate way of 
visualizing variability in any dataset , than the students T confidence 
intervals, because it does not assume normality  ?


Yes but instead of saying variability (which quantiles are good at) we 
are talking about the precision of the mean.




Can you explain the meaning of B, and how to find a sensible value (if 
not the default is sufficient) ?


For most purposes the default is sufficient.  There are great books and 
papers on the bootstrap for more info, including improved variations on 
the simple bootstrap percentile confidence interval used here.


Frank



Best regards
Jarle Bjørgeengen







--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-26 Thread Jarle Bjørgeengen


On May 26, 2009, at 3:02 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:

Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose  
standard error , and not confidence interval as the default  
(the naming of the plotting functions associates closer to  
the confidence interval  ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals.  
The function
needs to return the two values representing the upper and  
lower CI

values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ 
sqrt(length(x))

my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))
Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in  
general confidence limits should be asymmetric (a la  
bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence  
interval should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.
Is it not true that the students-T (qt(... and so on)  
confidence intervals is quite robust against non-normality too ?


A teacher told me that, the students-T symmetric confidence  
intervals will give a adequate picture of the variability of  
the data in this particular case.
Incorrect.  Try running some simulations on highly skewed data.   
You will find situations where the confidence coverage is not  
very close of the stated level (e.g., 0.95) and more situations  
where the overall coverage is 0.95 because one tail area is near  
0 and the other is near 0.05.


The larger the sample size, the more skewness has to be present  
to cause this problem.
OK - I'm convinced. It turns out that the first change I made to  
sciplot

was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so,  
I'll

try to incorporate this as an option in sciplot.


library(Hmisc)
?smean.cl.boot

H(arrel)misc :-)
Thanks for valuable input Frank.
This seems to work fine. (slightly more time consuming , but what  
do we have CPU power for )

library(Hmisc)
library(sciplot)
my.ci - function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])


Don't double the executing time by running it twice!  And this way  
you might possibly get an upper confidence interval that is lower  
than the lower one.  Do function(x) smean.cl.boot(x)[-1]




D'oh

lineplot 
.CI 
(V1 
,V2 
,data 
= 
d 
,col 
= 
c 
(4 
),err 
.col 
= 
c 
(1 
),err 
.width 
= 
0.02 
,legend 
=FALSE,xlab=Timeofday,ylab=IOPS,ci.fun=my.ci,cex=0.5,lwd=0.7)  
Have I understood you correct in that this is a more accurate way  
of visualizing variability in any dataset , than the students T  
confidence intervals, because it does not assume normality  ?


Yes but instead of saying variability (which quantiles are good at)  
we are talking about the precision of the mean.




Right.

Can you explain the meaning of B, and how to find a sensible value  
(if not the default is sufficient) ?


For most purposes the default is sufficient.  There are great books  
and papers on the bootstrap for more info, including improved  
variations on the simple bootstrap percentile confidence interval  
used here.


Frank


Once again, thanks.

Best regards
- Jarle Bjørgeengen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-25 Thread Jarle Bjørgeengen


On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard  
error , and not confidence interval as the default (the naming of  
the plotting functions associates closer to the confidence  
interval  ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals. The  
function

needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ 
sqrt(length(x))

my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))


Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general  
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence interval  
should be ok , right ?


Yes; I do see a normal distribution about once every 10 years.


Is it not true that the students-T (qt(... and so on) confidence  
intervals is quite robust against non-normality too ?


A teacher told me that, the students-T symmetric confidence intervals  
will give a adequate picture of the variability of the data in this  
particular case.


Best rgds
Jarle Bjørgeengen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-25 Thread Frank E Harrell Jr

Jarle Bjørgeengen wrote:


On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard error 
, and not confidence interval as the default (the naming of the 
plotting functions associates closer to the confidence interval 
 ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals. The 
function

needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))


Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence interval 
should be ok , right ?


Yes; I do see a normal distribution about once every 10 years.


Is it not true that the students-T (qt(... and so on) confidence 
intervals is quite robust against non-normality too ?


A teacher told me that, the students-T symmetric confidence intervals 
will give a adequate picture of the variability of the data in this 
particular case.


Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close of 
the stated level (e.g., 0.95) and more situations where the overall 
coverage is 0.95 because one tail area is near 0 and the other is near 0.05.


The larger the sample size, the more skewness has to be present to cause 
this problem.


Frank



Best rgds
Jarle Bjørgeengen





--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-25 Thread spencerg

Frank E Harrell Jr wrote:

spencerg wrote:

Dear Frank, et al.:

Frank E Harrell Jr wrote:

snip
Yes; I do see a normal distribution about once every 10 years.


 To what do you attribute the nonnormality you see in most cases?
  (1) Unmodeled components of variance that can generate 
errors in interpretation if ignored, even with bootstrapping?


  (2) Honest outliers that do not relate to the phenomena of 
interest and would better be removed through improved checks on data 
quality, but where bootstrapping is appropriate (provided the data 
are not also contaminated with (1))?


  (3) Situations where the physical application dictates a 
different distribution such as binomial, lognormal, gamma, etc., 
possibly also contaminated with (1) and (2)?


 I've fit mixtures of normals to data before, but one needs to be 
careful about not carrying that to extremes, as the mixture may be a 
result of (1) and therefore not replicable.


 George Box once remarked that he thought most designed 
experiments included split plotting that had been ignored in the 
analysis.  That is only a special case of (1).


 Thanks,
 Spencer Graves


Spencer,

Those are all important reasons for non-normality of margin 
distributions.  But the biggest reason of all is that the underlying 
process did not know about the normal distribution.  Normality in raw 
data is usually an accident.


 Frank: 



 Might there be a difference between the physical and social 
sciences on this issue? 



 The central limit effect works pretty well with many kinds of 
manufacturing data, except that it is often masked by between-lot 
components of variance.  The first differences in log(prices) are often 
long-tailed and negatively skewed.  Standard GARCH and similar models 
handle the long tails well but miss the skewness, at least in what I've 
seen.  I think that can be fixed, but I have not yet seen it done. 



 Social science data, however, often involve discrete scales where 
the raters' interpretations of the scales rarely match any standard 
distribution.  Transforming to latent variables, e.g., via factor 
analysis, may help but do not eliminate the problem. 



 Thanks for your comments. 


 Spencer


Frank



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-25 Thread Frank E Harrell Jr

spencerg wrote:

Frank E Harrell Jr wrote:

spencerg wrote:

Dear Frank, et al.:

Frank E Harrell Jr wrote:

snip
Yes; I do see a normal distribution about once every 10 years.


 To what do you attribute the nonnormality you see in most cases?
  (1) Unmodeled components of variance that can generate 
errors in interpretation if ignored, even with bootstrapping?


  (2) Honest outliers that do not relate to the phenomena of 
interest and would better be removed through improved checks on data 
quality, but where bootstrapping is appropriate (provided the data 
are not also contaminated with (1))?


  (3) Situations where the physical application dictates a 
different distribution such as binomial, lognormal, gamma, etc., 
possibly also contaminated with (1) and (2)?


 I've fit mixtures of normals to data before, but one needs to be 
careful about not carrying that to extremes, as the mixture may be a 
result of (1) and therefore not replicable.


 George Box once remarked that he thought most designed 
experiments included split plotting that had been ignored in the 
analysis.  That is only a special case of (1).


 Thanks,
 Spencer Graves


Spencer,

Those are all important reasons for non-normality of margin 
distributions.  But the biggest reason of all is that the underlying 
process did not know about the normal distribution.  Normality in raw 
data is usually an accident.


 Frank:

 Might there be a difference between the physical and social 
sciences on this issue?


Hi Spencer,

I doubt that the difference is large, but biological measurements seem 
to be more of a problem.




 The central limit effect works pretty well with many kinds of 
manufacturing data, except that it is often masked by between-lot 
components of variance.  The first differences in log(prices) are often 
long-tailed and negatively skewed.  Standard GARCH and similar models 
handle the long tails well but miss the skewness, at least in what I've 
seen.  I think that can be fixed, but I have not yet seen it done.


The central limit theorem in and of itself doesn't help because it 
doesn't tell you how large N must be before normality holds well enough.




 Social science data, however, often involve discrete scales where 
the raters' interpretations of the scales rarely match any standard 
distribution.  Transforming to latent variables, e.g., via factor 
analysis, may help but do not eliminate the problem.


Good example.  Many of the scales I've seen are non-normal or even 
multi-modal.





 Thanks for your comments.


Thanks for yours
Frank


 Spencer


Frank







--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-25 Thread Manuel Morales
On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:
 Jarle Bjørgeengen wrote:
  
  On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:
  
  Jarle Bjørgeengen wrote:
  On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:
  Jarle Bjørgeengen wrote:
  Great,
  thanks Manuel.
  Just for curiosity, any particular reason you chose standard error 
  , and not confidence interval as the default (the naming of the 
  plotting functions associates closer to the confidence interval 
   ) error indication .
  - Jarle Bjørgeengen
  On May 24, 2009, at 3:02 , Manuel Morales wrote:
  You define your own function for the confidence intervals. The 
  function
  needs to return the two values representing the upper and lower CI
  values. So:
 
  qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
  my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))
 
  Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
  confidence limits should be asymmetric (a la bootstrap).
  Thanks,
  if the date is normally distributed , symmetric confidence interval 
  should be ok , right ?
 
  Yes; I do see a normal distribution about once every 10 years.
  
  Is it not true that the students-T (qt(... and so on) confidence 
  intervals is quite robust against non-normality too ?
  
  A teacher told me that, the students-T symmetric confidence intervals 
  will give a adequate picture of the variability of the data in this 
  particular case.
 
 Incorrect.  Try running some simulations on highly skewed data.  You 
 will find situations where the confidence coverage is not very close of 
 the stated level (e.g., 0.95) and more situations where the overall 
 coverage is 0.95 because one tail area is near 0 and the other is near 0.05.
 
 The larger the sample size, the more skewness has to be present to cause 
 this problem.

OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.

BTW Jarle - to answer an earlier question, standard error is the
standard in my field, ecology, and that's why it's the current default
in sciplot.

Manuel

 
 Frank
 
  
  Best rgds
  Jarle Bjørgeengen
  
  
 
 
-- 
http://mutualism.williams.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-25 Thread Frank E Harrell Jr

Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard error 
, and not confidence interval as the default (the naming of the 
plotting functions associates closer to the confidence interval 
 ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals. The 
function

needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))
Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence interval 
should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.
Is it not true that the students-T (qt(... and so on) confidence 
intervals is quite robust against non-normality too ?


A teacher told me that, the students-T symmetric confidence intervals 
will give a adequate picture of the variability of the data in this 
particular case.
Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close of 
the stated level (e.g., 0.95) and more situations where the overall 
coverage is 0.95 because one tail area is near 0 and the other is near 0.05.


The larger the sample size, the more skewness has to be present to cause 
this problem.


OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.


library(Hmisc)
?smean.cl.boot



BTW Jarle - to answer an earlier question, standard error is the
standard in my field, ecology, and that's why it's the current default
in sciplot.


Too bad.
Frank



Manuel


Frank


Best rgds
Jarle Bjørgeengen







--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread Manuel Morales
You define your own function for the confidence intervals. The function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

lineplot.CI(x.factor = dose, response = len, data = ToothGrowth,
ci.fun=my.ci)

Manuel

On Fri, 2009-05-22 at 18:38 +0200, Jarle Bjørgeengen wrote:
 Hi,
 
 I would like to have lineplot.CI and barplot.CI to actually plot  
 confidence intervals , instead of standard error.
 
 I understand I have to use the ci.fun option, but I'm not quite sure  
 how.
 
 Like this :
 
qt(0.975,df=n-1)*s/sqrt(n)
 
 but how can I apply it to visualize the length of the student's T  
 confidence intervals rather than the stdandard error of the plotted  
 means ?
 
-- 
http://mutualism.williams.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread Jarle Bjørgeengen

Great,

thanks Manuel.

Just for curiosity, any particular reason you chose standard error ,  
and not confidence interval as the default (the naming of the plotting  
functions associates closer to the confidence interval  ) error  
indication .


- Jarle Bjørgeengen

On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals. The  
function

needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

lineplot.CI(x.factor = dose, response = len, data = ToothGrowth,
   ci.fun=my.ci)

Manuel

On Fri, 2009-05-22 at 18:38 +0200, Jarle Bjørgeengen wrote:

Hi,

I would like to have lineplot.CI and barplot.CI to actually plot
confidence intervals , instead of standard error.

I understand I have to use the ci.fun option, but I'm not quite sure
how.

Like this :


qt(0.975,df=n-1)*s/sqrt(n)


but how can I apply it to visualize the length of the student's T
confidence intervals rather than the stdandard error of the plotted
means ?


--
http://mutualism.williams.edu



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread Frank E Harrell Jr

Jarle Bjørgeengen wrote:

Great,

thanks Manuel.

Just for curiosity, any particular reason you chose standard error , and 
not confidence interval as the default (the naming of the plotting 
functions associates closer to the confidence interval  ) error 
indication .


- Jarle Bjørgeengen

On May 24, 2009, at 3:02 , Manuel Morales wrote:


You define your own function for the confidence intervals. The function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))


Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence 
limits should be asymmetric (a la bootstrap).


I'm not sure how NAs are handled.

Frank



lineplot.CI(x.factor = dose, response = len, data = ToothGrowth,
   ci.fun=my.ci)

Manuel

On Fri, 2009-05-22 at 18:38 +0200, Jarle Bjørgeengen wrote:

Hi,

I would like to have lineplot.CI and barplot.CI to actually plot
confidence intervals , instead of standard error.

I understand I have to use the ci.fun option, but I'm not quite sure
how.

Like this :


qt(0.975,df=n-1)*s/sqrt(n)


but how can I apply it to visualize the length of the student's T
confidence intervals rather than the stdandard error of the plotted
means ?


--
http://mutualism.williams.edu



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread Jarle Bjørgeengen


On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard  
error , and not confidence interval as the default (the naming of  
the plotting functions associates closer to the confidence  
interval  ) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:
You define your own function for the confidence intervals. The  
function

needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/ 
sqrt(length(x))

my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))


Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general  
confidence limits should be asymmetric (a la bootstrap).


Thanks,

if the date is normally distributed , symmetric confidence interval  
should be ok , right ?


When plotting the individual sample , it looks normally distributed.

Best regards.
Jarle Bjørgeengen
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread Frank E Harrell Jr

Jarle Bjørgeengen wrote:


On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:


Jarle Bjørgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard error , 
and not confidence interval as the default (the naming of the 
plotting functions associates closer to the confidence interval  
) error indication .

- Jarle Bjørgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals. The function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun - function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci - function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))


Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).


Thanks,

if the date is normally distributed , symmetric confidence interval 
should be ok , right ?


Yes; I do see a normal distribution about once every 10 years.



When plotting the individual sample , it looks normally distributed.


An appropriate qqnorm plot is a better way to check, but often the data 
cannot tell you about the normality of themselves.   It's usually better 
to use methods (e.g., bootstrap) that do not assume normality and that 
provide skewed confidence intervals if the data are skewed.


Frank



Best regards.
Jarle Bjørgeengen



--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread spencerg
Dear Frank, et al.: 



Frank E Harrell Jr wrote:

snip
Yes; I do see a normal distribution about once every 10 years.


 To what do you attribute the nonnormality you see in most cases?  



  (1) Unmodeled components of variance that can generate errors 
in interpretation if ignored, even with bootstrapping? 



  (2) Honest outliers that do not relate to the phenomena of 
interest and would better be removed through improved checks on data 
quality, but where bootstrapping is appropriate (provided the data are 
not also contaminated with (1))? 



  (3) Situations where the physical application dictates a 
different distribution such as binomial, lognormal, gamma, etc., 
possibly also contaminated with (1) and (2)? 



 I've fit mixtures of normals to data before, but one needs to be 
careful about not carrying that to extremes, as the mixture may be a 
result of (1) and therefore not replicable. 



 George Box once remarked that he thought most designed experiments 
included split plotting that had been ignored in the analysis.  That is 
only a special case of (1). 



 Thanks,
 Spencer Graves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sciplot question

2009-05-24 Thread Frank E Harrell Jr

spencerg wrote:

Dear Frank, et al.:

Frank E Harrell Jr wrote:

snip
Yes; I do see a normal distribution about once every 10 years.


 To what do you attribute the nonnormality you see in most cases? 

  (1) Unmodeled components of variance that can generate errors 
in interpretation if ignored, even with bootstrapping?


  (2) Honest outliers that do not relate to the phenomena of 
interest and would better be removed through improved checks on data 
quality, but where bootstrapping is appropriate (provided the data are 
not also contaminated with (1))?


  (3) Situations where the physical application dictates a 
different distribution such as binomial, lognormal, gamma, etc., 
possibly also contaminated with (1) and (2)?


 I've fit mixtures of normals to data before, but one needs to be 
careful about not carrying that to extremes, as the mixture may be a 
result of (1) and therefore not replicable.


 George Box once remarked that he thought most designed experiments 
included split plotting that had been ignored in the analysis.  That is 
only a special case of (1).


 Thanks,
 Spencer Graves


Spencer,

Those are all important reasons for non-normality of margin 
distributions.  But the biggest reason of all is that the underlying 
process did not know about the normal distribution.  Normality in raw 
data is usually an accident.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sciplot question

2009-05-22 Thread Jarle Bjørgeengen

Hi,

I would like to have lineplot.CI and barplot.CI to actually plot  
confidence intervals , instead of standard error.


I understand I have to use the ci.fun option, but I'm not quite sure  
how.


Like this :

  qt(0.975,df=n-1)*s/sqrt(n)

but how can I apply it to visualize the length of the student's T  
confidence intervals rather than the stdandard error of the plotted  
means ?


--
~~
Best regards   .~.
Jarle Bjørgeengen  /V\
Mob: +47 9155 7978// \\
http://www.uio.no/sok?person=jb  /(   )\
while(){if(s/^(.*\?)$/42 !/){print $1 $_}}^`~'^
~~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.