Re: [R] population variance and sample variance

2010-02-04 Thread Greg Snow
Probably not a typo, but a different textbook used originally.  Statistics is 
still a relatively young science, so we have not settled on a single set of 
notation/symbols/jargon yet (look at intro textbooks, is p the population 
proportion (with p-hat the sample) or is p the sample proportion (with pi as 
the population)?

I originally learned that dividing by n gives the 'population' variance since 
if you have the entire population then mu is known exactly and you do not need 
to correct for unknown mu.  You should only divide by n when you have the 
entire population.  When you have a sample you need to divide by n-1 to adjust 
for using the sample mean.

So from that I learned: population-divide by n; sample-divide by n-1.

But I have seen others use the approach of dividing a sample sum of squares by 
n gives the variance of the sample data, but dividing by n-1 gives the estimate 
of the population variance.

So from that thinking: population-divide by n-1; sample-divide by n.

Both make sense, so to be clear it is best to just state the divisor rather 
than using terms like population and sample and expecting to be unambiguous.

I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), 
but these are not perfect descriptors once you start talking about standard 
deviations rather than variances.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ista Zahn
 Sent: Tuesday, February 02, 2010 12:03 PM
 To: Peng Yu
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] population variance and sample variance
 
 Probably a simple typo, but just to keep things straight: you want to
 divide by n when describing the standard deviation of a sample, and
 divide by n-1 when estimating a population standard deviation (your
 initial description had it backwards I think).
 
 On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
  On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
  kingsfordjo...@gmail.com wrote:
  sum((x-mean(x))^2)/(n)
  [1] 0.4894708
  ((n-1)/n) * var(x)
  [1] 0.4894708
 
  But this is not a built-in function in R to do so, right?
 
  hth,
  Kingsford
 
  On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com
 wrote:
  It seems that var() computes sample variance. It is straight
 forward
  to compute population variance from sample variance. However, I
 feel
  that it is still convenient to have a function that can compute
  population variance. Is there a population variance function
 available
  in R?
 
  $ Rscript var.R
  set.seed(0)
  n = 4
  x = rnorm(n)
  var(x)
  [1] 0.6526278
  sum((x-mean(x))^2)/(n-1)
  [1] 0.6526278
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-04 Thread Ista Zahn
Ah, that makes sense. Thanks for the clarification Greg.

-Ista

On Thu, Feb 4, 2010 at 5:58 PM, Greg Snow greg.s...@imail.org wrote:
 Probably not a typo, but a different textbook used originally.  Statistics is 
 still a relatively young science, so we have not settled on a single set of 
 notation/symbols/jargon yet (look at intro textbooks, is p the population 
 proportion (with p-hat the sample) or is p the sample proportion (with pi as 
 the population)?

 I originally learned that dividing by n gives the 'population' variance since 
 if you have the entire population then mu is known exactly and you do not 
 need to correct for unknown mu.  You should only divide by n when you have 
 the entire population.  When you have a sample you need to divide by n-1 to 
 adjust for using the sample mean.

 So from that I learned: population-divide by n; sample-divide by n-1.

 But I have seen others use the approach of dividing a sample sum of squares 
 by n gives the variance of the sample data, but dividing by n-1 gives the 
 estimate of the population variance.

 So from that thinking: population-divide by n-1; sample-divide by n.

 Both make sense, so to be clear it is best to just state the divisor rather 
 than using terms like population and sample and expecting to be unambiguous.

 I have also seen them referred to as unbiased (n-1) and maximum likelihood 
 (n), but these are not perfect descriptors once you start talking about 
 standard deviations rather than variances.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ista Zahn
 Sent: Tuesday, February 02, 2010 12:03 PM
 To: Peng Yu
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] population variance and sample variance

 Probably a simple typo, but just to keep things straight: you want to
 divide by n when describing the standard deviation of a sample, and
 divide by n-1 when estimating a population standard deviation (your
 initial description had it backwards I think).

 On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
  On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
  kingsfordjo...@gmail.com wrote:
  sum((x-mean(x))^2)/(n)
  [1] 0.4894708
  ((n-1)/n) * var(x)
  [1] 0.4894708
 
  But this is not a built-in function in R to do so, right?
 
  hth,
  Kingsford
 
  On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com
 wrote:
  It seems that var() computes sample variance. It is straight
 forward
  to compute population variance from sample variance. However, I
 feel
  that it is still convenient to have a function that can compute
  population variance. Is there a population variance function
 available
  in R?
 
  $ Rscript var.R
  set.seed(0)
  n = 4
  x = rnorm(n)
  var(x)
  [1] 0.6526278
  sum((x-mean(x))^2)/(n-1)
  [1] 0.6526278
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-04 Thread Gabor Grothendieck
Checking VAR_SAMP and VAR_POP in the H2 and PostgreSQL databases and
VAR and VARP in Excel we find that in all three cases the sample
variance uses n-1.  Here is an R example using H2 and sqldf:

 library(RH2)
 library(sqldf)
 DF - data.frame(x = 1:3)
 sqldf(select VAR_SAMP(x), VAR_POP(x) from DF)
  VAR_SAMP..x.. VAR_POP..x..
1 10.667

 sum((DF$x - mean(DF$x))^2)/2
[1] 1
 var(DF$x)
[1] 1


On Thu, Feb 4, 2010 at 12:58 PM, Greg Snow greg.s...@imail.org wrote:
 Probably not a typo, but a different textbook used originally.  Statistics is 
 still a relatively young science, so we have not settled on a single set of 
 notation/symbols/jargon yet (look at intro textbooks, is p the population 
 proportion (with p-hat the sample) or is p the sample proportion (with pi as 
 the population)?

 I originally learned that dividing by n gives the 'population' variance since 
 if you have the entire population then mu is known exactly and you do not 
 need to correct for unknown mu.  You should only divide by n when you have 
 the entire population.  When you have a sample you need to divide by n-1 to 
 adjust for using the sample mean.

 So from that I learned: population-divide by n; sample-divide by n-1.

 But I have seen others use the approach of dividing a sample sum of squares 
 by n gives the variance of the sample data, but dividing by n-1 gives the 
 estimate of the population variance.

 So from that thinking: population-divide by n-1; sample-divide by n.

 Both make sense, so to be clear it is best to just state the divisor rather 
 than using terms like population and sample and expecting to be unambiguous.

 I have also seen them referred to as unbiased (n-1) and maximum likelihood 
 (n), but these are not perfect descriptors once you start talking about 
 standard deviations rather than variances.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ista Zahn
 Sent: Tuesday, February 02, 2010 12:03 PM
 To: Peng Yu
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] population variance and sample variance

 Probably a simple typo, but just to keep things straight: you want to
 divide by n when describing the standard deviation of a sample, and
 divide by n-1 when estimating a population standard deviation (your
 initial description had it backwards I think).

 On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
  On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
  kingsfordjo...@gmail.com wrote:
  sum((x-mean(x))^2)/(n)
  [1] 0.4894708
  ((n-1)/n) * var(x)
  [1] 0.4894708
 
  But this is not a built-in function in R to do so, right?
 
  hth,
  Kingsford
 
  On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com
 wrote:
  It seems that var() computes sample variance. It is straight
 forward
  to compute population variance from sample variance. However, I
 feel
  that it is still convenient to have a function that can compute
  population variance. Is there a population variance function
 available
  in R?
 
  $ Rscript var.R
  set.seed(0)
  n = 4
  x = rnorm(n)
  var(x)
  [1] 0.6526278
  sum((x-mean(x))^2)/(n-1)
  [1] 0.6526278
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-04 Thread Peng Yu
On Thu, Feb 4, 2010 at 11:58 AM, Greg Snow greg.s...@imail.org wrote:
 Probably not a typo, but a different textbook used originally.  Statistics is 
 still a relatively young science, so we have not settled on a single set of 
 notation/symbols/jargon yet (look at intro textbooks, is p the population 
 proportion (with p-hat the sample) or is p the sample proportion (with pi as 
 the population)?

 I originally learned that dividing by n gives the 'population' variance since 
 if you have the entire population then mu is known exactly and you do not 
 need to correct for unknown mu.  You should only divide by n when you have 
 the entire population.  When you have a sample you need to divide by n-1 to 
 adjust for using the sample mean.

 So from that I learned: population-divide by n; sample-divide by n-1.

 But I have seen others use the approach of dividing a sample sum of squares 
 by n gives the variance of the sample data, but dividing by n-1 gives the 
 estimate of the population variance.

 So from that thinking: population-divide by n-1; sample-divide by n.

 Both make sense, so to be clear it is best to just state the divisor rather 
 than using terms like population and sample and expecting to be unambiguous.

 I have also seen them referred to as unbiased (n-1) and maximum likelihood 
 (n), but these are not perfect descriptors once you start talking about 
 standard deviations rather than variances.


I'm so surprised that even this basic definition does not have unique
name in the nomenclature, which might cause confusion in certain
context. Just some of my thought---if both definitions are OK, then
the wiki page might be revised
http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance.
After all, many none pure statisticians relies on wiki for easy access
of some simple terms.


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ista Zahn
 Sent: Tuesday, February 02, 2010 12:03 PM
 To: Peng Yu
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] population variance and sample variance

 Probably a simple typo, but just to keep things straight: you want to
 divide by n when describing the standard deviation of a sample, and
 divide by n-1 when estimating a population standard deviation (your
 initial description had it backwards I think).

 On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
  On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
  kingsfordjo...@gmail.com wrote:
  sum((x-mean(x))^2)/(n)
  [1] 0.4894708
  ((n-1)/n) * var(x)
  [1] 0.4894708
 
  But this is not a built-in function in R to do so, right?
 
  hth,
  Kingsford
 
  On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com
 wrote:
  It seems that var() computes sample variance. It is straight
 forward
  to compute population variance from sample variance. However, I
 feel
  that it is still convenient to have a function that can compute
  population variance. Is there a population variance function
 available
  in R?
 
  $ Rscript var.R
  set.seed(0)
  n = 4
  x = rnorm(n)
  var(x)
  [1] 0.6526278
  sum((x-mean(x))^2)/(n-1)
  [1] 0.6526278
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-04 Thread Bert Gunter
 Well, a perverse view: ;-)

If using n vs n-1 makes a difference in the results, then you have too
little data (more properly, error df) to say much about the variance anyway:
n vs n-1 is the least of your problems. 

Otherwise, choose whichever you're in the mood for. Just state which for
reproducibility's sake.

In other words, why waste any time or energy on such a pointless discussion?


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Greg Snow
Sent: Thursday, February 04, 2010 9:59 AM
To: Ista Zahn; Peng Yu
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] population variance and sample variance

Probably not a typo, but a different textbook used originally.  Statistics
is still a relatively young science, so we have not settled on a single set
of notation/symbols/jargon yet (look at intro textbooks, is p the population
proportion (with p-hat the sample) or is p the sample proportion (with pi as
the population)?

I originally learned that dividing by n gives the 'population' variance
since if you have the entire population then mu is known exactly and you do
not need to correct for unknown mu.  You should only divide by n when you
have the entire population.  When you have a sample you need to divide by
n-1 to adjust for using the sample mean.

So from that I learned: population-divide by n; sample-divide by n-1.

But I have seen others use the approach of dividing a sample sum of squares
by n gives the variance of the sample data, but dividing by n-1 gives the
estimate of the population variance.

So from that thinking: population-divide by n-1; sample-divide by n.

Both make sense, so to be clear it is best to just state the divisor rather
than using terms like population and sample and expecting to be unambiguous.

I have also seen them referred to as unbiased (n-1) and maximum likelihood
(n), but these are not perfect descriptors once you start talking about
standard deviations rather than variances.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ista Zahn
 Sent: Tuesday, February 02, 2010 12:03 PM
 To: Peng Yu
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] population variance and sample variance
 
 Probably a simple typo, but just to keep things straight: you want to
 divide by n when describing the standard deviation of a sample, and
 divide by n-1 when estimating a population standard deviation (your
 initial description had it backwards I think).
 
 On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
  On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
  kingsfordjo...@gmail.com wrote:
  sum((x-mean(x))^2)/(n)
  [1] 0.4894708
  ((n-1)/n) * var(x)
  [1] 0.4894708
 
  But this is not a built-in function in R to do so, right?
 
  hth,
  Kingsford
 
  On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com
 wrote:
  It seems that var() computes sample variance. It is straight
 forward
  to compute population variance from sample variance. However, I
 feel
  that it is still convenient to have a function that can compute
  population variance. Is there a population variance function
 available
  in R?
 
  $ Rscript var.R
  set.seed(0)
  n = 4
  x = rnorm(n)
  var(x)
  [1] 0.6526278
  sum((x-mean(x))^2)/(n-1)
  [1] 0.6526278
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented

Re: [R] population variance and sample variance

2010-02-04 Thread Kingsford Jones
On Tue, Feb 2, 2010 at 10:25 AM, Peng Yu pengyu...@gmail.com wrote:
 On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
 kingsfordjo...@gmail.com wrote:
 sum((x-mean(x))^2)/(n)
 [1] 0.4894708
 ((n-1)/n) * var(x)
 [1] 0.4894708

 But this is not a built-in function in R to do so, right?


No because down that path lies bloat



 hth,
 Kingsford

 On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote:
 It seems that var() computes sample variance. It is straight forward
 to compute population variance from sample variance. However, I feel
 that it is still convenient to have a function that can compute
 population variance. Is there a population variance function available
 in R?

 $ Rscript var.R
 set.seed(0)
 n = 4
 x = rnorm(n)
 var(x)
 [1] 0.6526278
 sum((x-mean(x))^2)/(n-1)
 [1] 0.6526278


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-04 Thread Kingsford Jones
On Thu, Feb 4, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote:

 I'm so surprised that even this basic definition does not have unique
 name in the nomenclature, which might cause confusion in certain
 context. Just some of my thought---if both definitions are OK, then
 the wiki page might be revised
 http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance.
 After all, many none pure statisticians relies on wiki for easy access
 of some simple terms.

I believe the nomenclature in your link is pretty well accepted.  The
point made earlier is that when sampling one uses (n-1) to get an
unbiased estimate of the *population* value, so one needs to be
careful w/ semantics and it's really better to be mathematically
explicit.  Also when we discuss the variance of a random variable we
are referring to its 2nd central moment (Var X = E(X - E X)^2), so
again the definition is context dependent.

Kingsford






 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ista Zahn
 Sent: Tuesday, February 02, 2010 12:03 PM
 To: Peng Yu
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] population variance and sample variance

 Probably a simple typo, but just to keep things straight: you want to
 divide by n when describing the standard deviation of a sample, and
 divide by n-1 when estimating a population standard deviation (your
 initial description had it backwards I think).

 On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
  On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
  kingsfordjo...@gmail.com wrote:
  sum((x-mean(x))^2)/(n)
  [1] 0.4894708
  ((n-1)/n) * var(x)
  [1] 0.4894708
 
  But this is not a built-in function in R to do so, right?
 
  hth,
  Kingsford
 
  On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com
 wrote:
  It seems that var() computes sample variance. It is straight
 forward
  to compute population variance from sample variance. However, I
 feel
  that it is still convenient to have a function that can compute
  population variance. Is there a population variance function
 available
  in R?
 
  $ Rscript var.R
  set.seed(0)
  n = 4
  x = rnorm(n)
  var(x)
  [1] 0.6526278
  sum((x-mean(x))^2)/(n-1)
  [1] 0.6526278
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-02 Thread Peng Yu
On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
kingsfordjo...@gmail.com wrote:
 sum((x-mean(x))^2)/(n)
 [1] 0.4894708
 ((n-1)/n) * var(x)
 [1] 0.4894708

But this is not a built-in function in R to do so, right?

 hth,
 Kingsford

 On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote:
 It seems that var() computes sample variance. It is straight forward
 to compute population variance from sample variance. However, I feel
 that it is still convenient to have a function that can compute
 population variance. Is there a population variance function available
 in R?

 $ Rscript var.R
 set.seed(0)
 n = 4
 x = rnorm(n)
 var(x)
 [1] 0.6526278
 sum((x-mean(x))^2)/(n-1)
 [1] 0.6526278


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2010-02-02 Thread Ista Zahn
Probably a simple typo, but just to keep things straight: you want to
divide by n when describing the standard deviation of a sample, and
divide by n-1 when estimating a population standard deviation (your
initial description had it backwards I think).

On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote:
 On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
 kingsfordjo...@gmail.com wrote:
 sum((x-mean(x))^2)/(n)
 [1] 0.4894708
 ((n-1)/n) * var(x)
 [1] 0.4894708

 But this is not a built-in function in R to do so, right?

 hth,
 Kingsford

 On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote:
 It seems that var() computes sample variance. It is straight forward
 to compute population variance from sample variance. However, I feel
 that it is still convenient to have a function that can compute
 population variance. Is there a population variance function available
 in R?

 $ Rscript var.R
 set.seed(0)
 n = 4
 x = rnorm(n)
 var(x)
 [1] 0.6526278
 sum((x-mean(x))^2)/(n-1)
 [1] 0.6526278


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] population variance and sample variance

2009-10-19 Thread Kingsford Jones
 sum((x-mean(x))^2)/(n)
[1] 0.4894708
 ((n-1)/n) * var(x)
[1] 0.4894708


hth,
Kingsford

On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote:
 It seems that var() computes sample variance. It is straight forward
 to compute population variance from sample variance. However, I feel
 that it is still convenient to have a function that can compute
 population variance. Is there a population variance function available
 in R?

 $ Rscript var.R
 set.seed(0)
 n = 4
 x = rnorm(n)
 var(x)
 [1] 0.6526278
 sum((x-mean(x))^2)/(n-1)
 [1] 0.6526278


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.