After I sent the original mail, I found this in the Encyclopedia of 
Biostatistics (2):

"There is an ongoing debate as to whether the sample design must be 
considered when deriving statistical models (as opposed to estimates of 
means, proportions, totals, and ratios) based on sample survey data. 
Analysts interested in using statistical techniques such as linear 
regression, logistic regression, survival analysis, or categorical data 
analysis on survey data are divided as to whether they feel it is necessary 
to use specialized software. The model-based analysts argue that, as long 
as the model is specified correctly, they can proceed without recognizing 
aspects of the survey design (such as stratification, clustering, and 
unequal selection probabilities), and can therefore use standard 
statistical packages. The design-based analysts argue to the contrary that 
it is important to account for the survey design when estimating models. 
The debate between these two factions has been ongoing for quite awhile and 
is not likely to be resolved soon (Groves [14], Skinner et al. [29], Korn 
and Graubard [22], Hansen et al. [16]). A compromise position adopted by 
some is to use standard statistical software in modeling analyses, but to 
incorporate into the model the variables that were used to define the 
strata, the PSUs and the weights. "

Most epidemiologists are mode builders, not population describers. If you 
do a "once-and-for-all" multiple imputation, you can account for many of 
the features of a two-stage survey (except that I don't know about the 
clustering thing). Am I right ?

Small typo correction:

"Case 5: instead of a simple random sample drawn from the non-responders,
draw a _stratified sample_ with differential sampling probabilities,
depending on Y.  "
should read
"Case 5: instead of a simple random sample drawn from the non-responders,
draw a _stratified sample_ from responders with differential sampling 
probabilities ,
depending on Y.  "


1. Brogan DJ. Pitfalls of Using Standard Statistical Software Packages for 
Sample Survey Data. In: Armitage P and Colton P , eds. Encyclopedia of 
biostatistics. Chichester: John Wiley & Sons Ltd, 1998. 
http://www.fas.harvard.edu/~stats/survey-soft/blc_eob.html
2. Carlson BL. Software for Statistical Analysis of Sample Survey Data. In: 
Armitage P and Colton P , eds. Encyclopedia of biostatistics. Chichester: 
John Wiley & Sons Ltd, 1998. 
http://www.fas.harvard.edu/~stats/survey-soft/donna_brogan.html

Yours,
Jan Brogger
From fharrell <@t> virginia.edu  Sun Mar 18 12:03:59 2001
From: fharrell <@t> virginia.edu ([email protected])
Date: Sun Jun 26 08:24:58 2005
Subject: IMPUTE: Re: Survey analysis: "ordinary" survey software or multiple 
 imputation
References: <[email protected]>
Message-ID: <[email protected]>

The argument against doing weighted analysis to account
for oversampling is a strong one, as weighted analyses
produce estimates with higher variance.  Cluster sampling
is an altogether different issue.  To get proper variances,
clustering must be taken into account.  Fortunately, this can
often be simple, using the cluster bootstrap or the
cluster version of the Huber sandwich covariance estimator.

Frank Harrell


Jan Brogger wrote:
> 
> After I sent the original mail, I found this in the Encyclopedia of
> Biostatistics (2):
> 
> "There is an ongoing debate as to whether the sample design must be
> considered when deriving statistical models (as opposed to estimates of
> means, proportions, totals, and ratios) based on sample survey data.
> Analysts interested in using statistical techniques such as linear
> regression, logistic regression, survival analysis, or categorical data
> analysis on survey data are divided as to whether they feel it is necessary
> to use specialized software. The model-based analysts argue that, as long
> as the model is specified correctly, they can proceed without recognizing
> aspects of the survey design (such as stratification, clustering, and
> unequal selection probabilities), and can therefore use standard
> statistical packages. The design-based analysts argue to the contrary that
> it is important to account for the survey design when estimating models.
> The debate between these two factions has been ongoing for quite awhile and
> is not likely to be resolved soon (Groves [14], Skinner et al. [29], Korn
> and Graubard [22], Hansen et al. [16]). A compromise position adopted by
> some is to use standard statistical software in modeling analyses, but to
> incorporate into the model the variables that were used to define the
> strata, the PSUs and the weights. "
> 
> Most epidemiologists are mode builders, not population describers. If you
> do a "once-and-for-all" multiple imputation, you can account for many of
> the features of a two-stage survey (except that I don't know about the
> clustering thing). Am I right ?
> 
> Small typo correction:
> 
> "Case 5: instead of a simple random sample drawn from the non-responders,
> draw a _stratified sample_ with differential sampling probabilities,
> depending on Y.  "
> should read
> "Case 5: instead of a simple random sample drawn from the non-responders,
> draw a _stratified sample_ from responders with differential sampling
> probabilities ,
> depending on Y.  "
> 
> 1. Brogan DJ. Pitfalls of Using Standard Statistical Software Packages for
> Sample Survey Data. In: Armitage P and Colton P , eds. Encyclopedia of
> biostatistics. Chichester: John Wiley & Sons Ltd, 1998.
> http://www.fas.harvard.edu/~stats/survey-soft/blc_eob.html
> 2. Carlson BL. Software for Statistical Analysis of Sample Survey Data. In:
> Armitage P and Colton P , eds. Encyclopedia of biostatistics. Chichester:
> John Wiley & Sons Ltd, 1998.
> http://www.fas.harvard.edu/~stats/survey-soft/donna_brogan.html
> 
> Yours,
> Jan Brogger

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Reply via email to