Hello, does anyone know if there are plans for an update of Little and
Rubin's Missing Data book (aka Missing Data bible)? -- Possibly Rod or Don
if they are reading this... :)
Thanks,
cd
*** Note: NEW MAILING ADDRESS ***
____________________________________________________________
Constantine Daskalakis, ScD
Assistant Professor,
Biostatistics Section, Division of Clinical Pharmacology,
Thomas Jefferson University,
125 S. 9th St. #402, Philadelphia, PA 19107
Tel: 215-955-5695
Fax: 215-955-5681
Email: [email protected]
____________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20010417/44d99a05/attachment.htm
From m.bohlig <@t> worldnet.att.net Sun Apr 22 00:48:03 2001
From: m.bohlig <@t> worldnet.att.net (E. Michael Bohlig)
Date: Sun Jun 26 08:24:58 2005
Subject: IMPUTE: Need help, imputing for dissertation (longish)
Message-ID: <[email protected]>
Imputers,
I am beginning to analyze data for my dissertation. The first thing I
have to do is deal with missing data. However, before I pose my
questions, let me give a brief description of my data. I am analyzing a
problem behavior scale for children and adolescents that consists of 120
items rated on a 3-point scale. Of these 120 items, 101 items make up 8
subscales and 10 of these double-load. The remaining 19 are not
assigned to any of the subscales. The number of items per subscale
range from 8 to 25. My sample size is 754. Of these, 24% are missing
one or more items with a maximum of 8 items missing for any one person
(2 respondents with missing data on 8 items). However, no more than
2.25% (17/754) of responses is missing on any given item. Within
subscale, the percent of items with missing data across persons ranges
from 10% (2/20 items) to 38% (3/8 items).
The data are quite skewed. Among the 101 items that are assigned to
subscales, 46 items were rated as not observed (response = 0) by 80% or
more of the respondents; 20 items were scored as not observed by at
least 90% of the respondents. The highest response category (response =
2) was not reported by any of the respondents for two of the 101 items.
Only 5% of the respondents used this category for about 45% of the items
and up to 10% of the respondents used the highest category on just over
70% of the items.
To deal with these missing data, I have been considering using multiple
imputation. I do not have access to S-Plus so I cannot use Schafer?s
CAT macro. I have read, however, that imputation using MCMC methods can
be robust to violations of normality so using a normal model may provide
adequate results (SEMNET discussion list, Oct 2000; Schafer, 1997; PROC
MI Procedure documentation, SAS, 2000). Although the MI Procedure in
SAS is currently experimental, my data are already in SAS data sets so I
plan to use this software to impute the missing data.
After imputing my data I will be conducting confirmatory factor analyses
and item response theory analyses. The IRT model I will be implementing
is the Graded Response Model (Samejima, 1969) which generates one slope
parameter and a separate difficulty parameter for each threshold (number
of response options ? 1). Since there are 3 response options on the
instrument, I will be estimating a total of 3 parameters.
Now for my questions.
1) Given the limited range of response options (0 ? 2) and the skewed
nature of the data, is the use of MCMC estimation under the assumption
of normality not appropriate?
2) Assuming that I can proceed with the imputation using a normal
model, should I impute within subscale, or should I impute using the
full instrument?
3) If I should impute within subscale, how do I deal with the items
that are assigned to more than one subscale?
4) After imputing the missing data and I have several complete-data
data sets, how should I combine the parameter estimates in the IRT
analysis? I will have three parameters per item to estimate. How do I
determine the between-imputation variance, the within-imputation
variance, and the total variance? How do I determine the relative
efficiency?
Any advice or words of wisdom will be greatly appreciated.
Thanks in advance,
Michael Bohlig