IMPUTE: Re: Survey analysis: "ordinary" survey software or multiple imputation

Stephen P. Baker Sun, 18 Mar 2001 12:38:33 -0800
Model based inference is strongly dependent upon having the correct model
and has potential to have significant bias. There is a payoff with small
sample sizes with the correct model.  While it doesn't directly address the
question of using SUDAAN vs. standard software, Hansen, Madow, and Tepping
did simulations comparing 5 estimators with stratified samples,  including
both model based and design based as well as the simple unbiased stratified
estimator.  The model based estimators were substantially biased.  One of
the model based estimators had slightly smaller average (over replications)
mean squared error than the simple estimator with smaller sample sizes.  The
model based estimators had the largest mean squared error among the 5
estimators in the larger sample size problems even though they had the
smallest variances because of their bias.

---------------------

Hansen, M; Madow, W, and Tepping, B. "An evaluation of model-dependent and
probability-sampling inferences in sample surveys". Journal of the American
Statistical Association, 12/1983, Vol. 78, No. 384, pp.776-793.
-.- -.. .---- .--. ..-.
Stephen P. Baker, MScPH                       (508) 856-2625
Lecturer in Biostatistics                     (209) 391-7902 fax
Academic Computing Services
University of Massachusetts Medical School
55 Lake Avenue North                          [EMAIL PROTECTED]
Worcester, MA 01655  USA

----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Sunday, March 18, 2001 1:03 PM
Subject: IMPUTE: Re: Survey analysis: "ordinary" survey software or multiple
imputation


> The argument against doing weighted analysis to account
> for oversampling is a strong one, as weighted analyses
> produce estimates with higher variance.  Cluster sampling
> is an altogether different issue.  To get proper variances,
> clustering must be taken into account.  Fortunately, this can
> often be simple, using the cluster bootstrap or the
> cluster version of the Huber sandwich covariance estimator.
>
> Frank Harrell
>
>
> Jan Brogger wrote:
> >
> > After I sent the original mail, I found this in the Encyclopedia of
> > Biostatistics (2):
> >
> > "There is an ongoing debate as to whether the sample design must be
> > considered when deriving statistical models (as opposed to estimates of
> > means, proportions, totals, and ratios) based on sample survey data.
> > Analysts interested in using statistical techniques such as linear
> > regression, logistic regression, survival analysis, or categorical data
> > analysis on survey data are divided as to whether they feel it is
necessary
> > to use specialized software. The model-based analysts argue that, as
long
> > as the model is specified correctly, they can proceed without
recognizing
> > aspects of the survey design (such as stratification, clustering, and
> > unequal selection probabilities), and can therefore use standard
> > statistical packages. The design-based analysts argue to the contrary
that
> > it is important to account for the survey design when estimating models.
> > The debate between these two factions has been ongoing for quite awhile
and
> > is not likely to be resolved soon (Groves [14], Skinner et al. [29],
Korn
> > and Graubard [22], Hansen et al. [16]). A compromise position adopted by
> > some is to use standard statistical software in modeling analyses, but
to
> > incorporate into the model the variables that were used to define the
> > strata, the PSUs and the weights. "
> >
> > Most epidemiologists are mode builders, not population describers. If
you
> > do a "once-and-for-all" multiple imputation, you can account for many of
> > the features of a two-stage survey (except that I don't know about the
> > clustering thing). Am I right ?
> >
> > Small typo correction:
> >
> > "Case 5: instead of a simple random sample drawn from the
non-responders,
> > draw a _stratified sample_ with differential sampling probabilities,
> > depending on Y.  "
> > should read
> > "Case 5: instead of a simple random sample drawn from the
non-responders,
> > draw a _stratified sample_ from responders with differential sampling
> > probabilities ,
> > depending on Y.  "
> >
> > 1. Brogan DJ. Pitfalls of Using Standard Statistical Software Packages
for
> > Sample Survey Data. In: Armitage P and Colton P , eds. Encyclopedia of
> > biostatistics. Chichester: John Wiley & Sons Ltd, 1998.
> > http://www.fas.harvard.edu/~stats/survey-soft/blc_eob.html
> > 2. Carlson BL. Software for Statistical Analysis of Sample Survey Data.
In:
> > Armitage P and Colton P , eds. Encyclopedia of biostatistics.
Chichester:
> > John Wiley & Sons Ltd, 1998.
> > http://www.fas.harvard.edu/~stats/survey-soft/donna_brogan.html
> >
> > Yours,
> > Jan Brogger
>
> --
> Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
>
IMPUTE: Re: Survey analysis: "ordinary" survey software or multiple imputation

Reply via email to