The argument against doing weighted analysis to account
for oversampling is a strong one, as weighted analyses
produce estimates with higher variance.  Cluster sampling
is an altogether different issue.  To get proper variances,
clustering must be taken into account.  Fortunately, this can
often be simple, using the cluster bootstrap or the
cluster version of the Huber sandwich covariance estimator.

Frank Harrell


Jan Brogger wrote:
> 
> After I sent the original mail, I found this in the Encyclopedia of
> Biostatistics (2):
> 
> "There is an ongoing debate as to whether the sample design must be
> considered when deriving statistical models (as opposed to estimates of
> means, proportions, totals, and ratios) based on sample survey data.
> Analysts interested in using statistical techniques such as linear
> regression, logistic regression, survival analysis, or categorical data
> analysis on survey data are divided as to whether they feel it is necessary
> to use specialized software. The model-based analysts argue that, as long
> as the model is specified correctly, they can proceed without recognizing
> aspects of the survey design (such as stratification, clustering, and
> unequal selection probabilities), and can therefore use standard
> statistical packages. The design-based analysts argue to the contrary that
> it is important to account for the survey design when estimating models.
> The debate between these two factions has been ongoing for quite awhile and
> is not likely to be resolved soon (Groves [14], Skinner et al. [29], Korn
> and Graubard [22], Hansen et al. [16]). A compromise position adopted by
> some is to use standard statistical software in modeling analyses, but to
> incorporate into the model the variables that were used to define the
> strata, the PSUs and the weights. "
> 
> Most epidemiologists are mode builders, not population describers. If you
> do a "once-and-for-all" multiple imputation, you can account for many of
> the features of a two-stage survey (except that I don't know about the
> clustering thing). Am I right ?
> 
> Small typo correction:
> 
> "Case 5: instead of a simple random sample drawn from the non-responders,
> draw a _stratified sample_ with differential sampling probabilities,
> depending on Y.  "
> should read
> "Case 5: instead of a simple random sample drawn from the non-responders,
> draw a _stratified sample_ from responders with differential sampling
> probabilities ,
> depending on Y.  "
> 
> 1. Brogan DJ. Pitfalls of Using Standard Statistical Software Packages for
> Sample Survey Data. In: Armitage P and Colton P , eds. Encyclopedia of
> biostatistics. Chichester: John Wiley & Sons Ltd, 1998.
> http://www.fas.harvard.edu/~stats/survey-soft/blc_eob.html
> 2. Carlson BL. Software for Statistical Analysis of Sample Survey Data. In:
> Armitage P and Colton P , eds. Encyclopedia of biostatistics. Chichester:
> John Wiley & Sons Ltd, 1998.
> http://www.fas.harvard.edu/~stats/survey-soft/donna_brogan.html
> 
> Yours,
> Jan Brogger

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Reply via email to