Re: [R] How to do bootstrap for the complex sample design?

2010-11-07 Thread Fei xu

Dear Professor Lumley;
 
Thank you so much for your invaluable advice!
 
I will digest your advice and try different methods.
 
Great thanks again!
 
Faye 
 
 Date: Fri, 5 Nov 2010 08:24:00 +1300
 Subject: Re: [R] How to do bootstrap for the complex sample design?
 From: tlum...@uw.edu
 To: timhesterb...@gmail.com
 CC: feix...@hotmail.com; r-help@r-project.org
 
 On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg timhesterb...@gmail.com 
 wrote:
  Faye wrote:
 Our survey is structured as : To be investigated area is divided into
 6 regions, within each region, one urban community and one rural
 community are randomly selected, then samples are randomly drawn from
 each selected uran and rural community.
 
 The problems is that in urban/rural stratum, we only have one sample.
 In this case, how to do bootstrap?
 
  You are lucky that your sample size is 1.  If it were 2 you would
  probably have proceeded without realizing that the answers were wrong.
 
  Suppose you had two samples in each stratum.  If you proceed naturally,
  drawing bootstrap samples of size 2 from each stratum, this would
  underestimate variability by a factor of 2.
 
  In general the ordinary nonparametric bootstrap estimates of variability
  are biased downward by a factor of (n-1)/n -- exactly for the mean,
  approximately for other statistics.  In multiple-sample and stratified
  situations, the bias depends on the stratum sizes.
 
  Three remedies are:
  * draw bootstrap samples of size n-1
  * bootknife sampling - omit one observation (a jackknife sample), then
   draw a bootstrap sample of size n from that
  * bootstrap from a kernel density estimate, with kernel covariance equal
   to empirical covariance (with divisor n-1) / n.
  The latter two are described in
  Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. 
  Smoothing, Proceedings of the Section on Statistics and the Environment, 
  American Statistical Association, 2924-2930.
  http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
 
  All three are undefined for samples of size 1.  You need to go to some
  other bootstrap, e.g. a parametric bootstrap with variability estimated
  from other data.
 
 
 And the 'survey' package supplies the first option. (It also supplies
 a bootstrap sample of size n that allows finite population
 corrections, designed for situations with a large n and a high
 sampling fraction, such as some business surveys.)
 
 With a sample size of 1 per stratum there are no design-unbiased
 estimators of the standard error, so as others have said you need
 external data.
 
 -thomas
 
 
 -- 
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Fei xu

Hello;
 
Our survey is structured as : To be investigated area is divided into 6 
regions, 
within each region, one urban community and one rural community are randomly 
selected,
then samples are randomly drawn from each selected uran and rural community.  
 
The problems is that in urban/rural stratum, we only have one sample. 
In this case, how to do bootstrap?
 
Any comments or hints are greatly appreciated!
 
Faye  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Robert A LaBudde

At 01:38 AM 11/4/2010, Fei xu wrote:


Hello;

Our survey is structured as : To be investigated area is divided 
into 6 regions,
within each region, one urban community and one rural community are 
randomly selected,

then samples are randomly drawn from each selected uran and rural community.

The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?

Any comments or hints are greatly appreciated!

Faye


Just make a table of your data, with each row corresponding to a 
measurement. You columns will be Region, UrbanCommunity, 
RuralCommunity and your response variables.


Bootstrap resampling is just generating random row indices into this 
table, with replacement. I.e.,


index- sample(1:N, N, replace=TRUE)

Then your resample is myTable[index,].

Because you chose UrbanCommunity and RuralCommunity randomly, this 
shouldn't be a problem. The fact that you choose a subsample size of 
1 means you won't be able to estimate within-region variances unless 
you make some serious assumptions (e.g., UrbanCommunity effect 
independent of Region effect).



Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Tim Hesterberg
Faye wrote:
Our survey is structured as : To be investigated area is divided into
6 regions, within each region, one urban community and one rural
community are randomly selected, then samples are randomly drawn from
each selected uran and rural community.

The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?

You are lucky that your sample size is 1.  If it were 2 you would
probably have proceeded without realizing that the answers were wrong.

Suppose you had two samples in each stratum.  If you proceed naturally,
drawing bootstrap samples of size 2 from each stratum, this would
underestimate variability by a factor of 2.

In general the ordinary nonparametric bootstrap estimates of variability
are biased downward by a factor of (n-1)/n -- exactly for the mean, 
approximately for other statistics.  In multiple-sample and stratified
situations, the bias depends on the stratum sizes.

Three remedies are:
* draw bootstrap samples of size n-1
* bootknife sampling - omit one observation (a jackknife sample), then
  draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
The latter two are described in 
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. 
Smoothing, Proceedings of the Section on Statistics and the Environment, 
American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

All three are undefined for samples of size 1.  You need to go to some
other bootstrap, e.g. a parametric bootstrap with variability estimated
from other data.

Tim Hesterberg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Thomas Lumley
On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg timhesterb...@gmail.com wrote:
 Faye wrote:
Our survey is structured as : To be investigated area is divided into
6 regions, within each region, one urban community and one rural
community are randomly selected, then samples are randomly drawn from
each selected uran and rural community.

The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?

 You are lucky that your sample size is 1.  If it were 2 you would
 probably have proceeded without realizing that the answers were wrong.

 Suppose you had two samples in each stratum.  If you proceed naturally,
 drawing bootstrap samples of size 2 from each stratum, this would
 underestimate variability by a factor of 2.

 In general the ordinary nonparametric bootstrap estimates of variability
 are biased downward by a factor of (n-1)/n -- exactly for the mean,
 approximately for other statistics.  In multiple-sample and stratified
 situations, the bias depends on the stratum sizes.

 Three remedies are:
 * draw bootstrap samples of size n-1
 * bootknife sampling - omit one observation (a jackknife sample), then
  draw a bootstrap sample of size n from that
 * bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
 The latter two are described in
 Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. 
 Smoothing, Proceedings of the Section on Statistics and the Environment, 
 American Statistical Association, 2924-2930.
 http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

 All three are undefined for samples of size 1.  You need to go to some
 other bootstrap, e.g. a parametric bootstrap with variability estimated
 from other data.


And the 'survey' package supplies the first option. (It also supplies
a bootstrap sample of size n that allows finite population
corrections, designed for situations with a large n and a high
sampling fraction, such as some business surveys.)

With a sample size of 1 per stratum there are no design-unbiased
estimators of the standard error, so as others have said you need
external data.

   -thomas


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.