Re: [R] population variance and sample variance
Probably not a typo, but a different textbook used originally. Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)? I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu. You should only divide by n when you have the entire population. When you have a sample you need to divide by n-1 to adjust for using the sample mean. So from that I learned: population-divide by n; sample-divide by n-1. But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance. So from that thinking: population-divide by n-1; sample-divide by n. Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous. I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, February 02, 2010 12:03 PM To: Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
Ah, that makes sense. Thanks for the clarification Greg. -Ista On Thu, Feb 4, 2010 at 5:58 PM, Greg Snow greg.s...@imail.org wrote: Probably not a typo, but a different textbook used originally. Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)? I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu. You should only divide by n when you have the entire population. When you have a sample you need to divide by n-1 to adjust for using the sample mean. So from that I learned: population-divide by n; sample-divide by n-1. But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance. So from that thinking: population-divide by n-1; sample-divide by n. Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous. I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, February 02, 2010 12:03 PM To: Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
Checking VAR_SAMP and VAR_POP in the H2 and PostgreSQL databases and VAR and VARP in Excel we find that in all three cases the sample variance uses n-1. Here is an R example using H2 and sqldf: library(RH2) library(sqldf) DF - data.frame(x = 1:3) sqldf(select VAR_SAMP(x), VAR_POP(x) from DF) VAR_SAMP..x.. VAR_POP..x.. 1 10.667 sum((DF$x - mean(DF$x))^2)/2 [1] 1 var(DF$x) [1] 1 On Thu, Feb 4, 2010 at 12:58 PM, Greg Snow greg.s...@imail.org wrote: Probably not a typo, but a different textbook used originally. Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)? I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu. You should only divide by n when you have the entire population. When you have a sample you need to divide by n-1 to adjust for using the sample mean. So from that I learned: population-divide by n; sample-divide by n-1. But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance. So from that thinking: population-divide by n-1; sample-divide by n. Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous. I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, February 02, 2010 12:03 PM To: Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
On Thu, Feb 4, 2010 at 11:58 AM, Greg Snow greg.s...@imail.org wrote: Probably not a typo, but a different textbook used originally. Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)? I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu. You should only divide by n when you have the entire population. When you have a sample you need to divide by n-1 to adjust for using the sample mean. So from that I learned: population-divide by n; sample-divide by n-1. But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance. So from that thinking: population-divide by n-1; sample-divide by n. Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous. I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances. I'm so surprised that even this basic definition does not have unique name in the nomenclature, which might cause confusion in certain context. Just some of my thought---if both definitions are OK, then the wiki page might be revised http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance. After all, many none pure statisticians relies on wiki for easy access of some simple terms. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, February 02, 2010 12:03 PM To: Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
Well, a perverse view: ;-) If using n vs n-1 makes a difference in the results, then you have too little data (more properly, error df) to say much about the variance anyway: n vs n-1 is the least of your problems. Otherwise, choose whichever you're in the mood for. Just state which for reproducibility's sake. In other words, why waste any time or energy on such a pointless discussion? Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Greg Snow Sent: Thursday, February 04, 2010 9:59 AM To: Ista Zahn; Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably not a typo, but a different textbook used originally. Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)? I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu. You should only divide by n when you have the entire population. When you have a sample you need to divide by n-1 to adjust for using the sample mean. So from that I learned: population-divide by n; sample-divide by n-1. But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance. So from that thinking: population-divide by n-1; sample-divide by n. Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous. I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, February 02, 2010 12:03 PM To: Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented
Re: [R] population variance and sample variance
On Tue, Feb 2, 2010 at 10:25 AM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? No because down that path lies bloat hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
On Thu, Feb 4, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote: I'm so surprised that even this basic definition does not have unique name in the nomenclature, which might cause confusion in certain context. Just some of my thought---if both definitions are OK, then the wiki page might be revised http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance. After all, many none pure statisticians relies on wiki for easy access of some simple terms. I believe the nomenclature in your link is pretty well accepted. The point made earlier is that when sampling one uses (n-1) to get an unbiased estimate of the *population* value, so one needs to be careful w/ semantics and it's really better to be mathematically explicit. Also when we discuss the variance of a random variable we are referring to its 2nd central moment (Var X = E(X - E X)^2), so again the definition is context dependent. Kingsford -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ista Zahn Sent: Tuesday, February 02, 2010 12:03 PM To: Peng Yu Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] population variance and sample variance Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
Probably a simple typo, but just to keep things straight: you want to divide by n when describing the standard deviation of a sample, and divide by n-1 when estimating a population standard deviation (your initial description had it backwards I think). On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu pengyu...@gmail.com wrote: On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones kingsfordjo...@gmail.com wrote: sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 But this is not a built-in function in R to do so, right? hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] population variance and sample variance
sum((x-mean(x))^2)/(n) [1] 0.4894708 ((n-1)/n) * var(x) [1] 0.4894708 hth, Kingsford On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu pengyu...@gmail.com wrote: It seems that var() computes sample variance. It is straight forward to compute population variance from sample variance. However, I feel that it is still convenient to have a function that can compute population variance. Is there a population variance function available in R? $ Rscript var.R set.seed(0) n = 4 x = rnorm(n) var(x) [1] 0.6526278 sum((x-mean(x))^2)/(n-1) [1] 0.6526278 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.