Re: [Impute] Rounding option on PROC MI and choosing a final MI dataset

Frank E Harrell Jr Thu, 10 Jan 2008 15:28:58 -0800

David Judkins wrote:

Frank,


Well, I am glad that conditioned my statement to refer to software known

to me.

This past summer, some co-workers and I presented some testing on a
really pathological joint distribution.  Would you be interested in
testing your aregImpute function on it?

--Dave


Yes

Thanks Dave
Frank


-----Original Message-----

From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]Sent: Wednesday, January 02, 2008 11:48 AM

To: David Judkins
Cc: Alan Zaslavsky; impute@lists.utsouthwestern.edu;
[EMAIL PROTECTED]
Subject: Re: [Impute] Rounding option on PROC MI and choosing a final MI
dataset

David Judkins wrote:

Raquel,

Your problem is typical of the class of problems that I have been
working on for about 15 years now.  You can look up my imputation

papers

in the CIS.   None of the currently available (free or marketed)
software solutions known to me are designed to preserve the structure

of

general multivariate data.  The ones that build models of multivariate
relationships are mostly designed for either normal or binary data.
Programs designed for general data are usually designed to impute a
single variable at a time and generally fail to preserve multivariate
structure.  If you have the luxury of a large programming budget, you
could program the algorithms that some of us here at Westat have

developed and published.


David,

In theory you are correct, but I think your note slightly misses thepoint. It is amazing how well the chained equations approach of MICEand my aregImpute function work, given they were not designed topreserve the multivariate structure. And they make fewer assumptions.I am particularly dubious about any methods that assume linearity andmultivariate normality.

aregImpute uses Fisher's optimum scoring algorithm to impute nominalvariables. If predictive mean matching is used with aregImpute (a morenonparametric approach not available with your multivariate approach),the distribution of imputed categories is quite sensible.


Frank Harrell

As Alan replied, however, given that all your individual item rates

are

low, perhaps one of the available solutions would work reasonably well

for you.

It sounds as if you don't have any skip patterns.  If so, you could

just

impute the mode for each variable.  A second solution that is only a
little more complicated would be to independently impute each variable
by a simple hotdeck.  Either way, you end up with 100% complete

vectors.

You don't have to do any rounding.  All variables have permissible
values. You will have better marginal distributions with independent

hotdecks than you get by imputing modes.

But neither solution protects multivariate structure.  Here is a bit
more complicated solution that tries to do that but is still fairly
simple:

Pick a single variable as the most important for your analyses.  Call

it

Y.  Let S be the maximum set of variables with zero item nonresponse.
Build the best model for Y in terms of S that you can.  (Doesn't have

to

be a linear model.)  Output predicted values of Y for the whole

sample.

Call them Ypred.  Let O be the maximum set of cases with zero
nonresponse on all variables.  Find the nearest neighbor in O for each
case with one or more missing values.  So then you have a donor case

and

a recipient case.  Let X1i,...,Xpi be the set of variables on

recipient

case i with missing values.  Let X1j,...,Xpj be the corresponding set

of

variables on the donor case. Impute Xki=Xkj for k=1,...,p.

To the extent that the variables in S are good predictors of Y and to
the extent that the other variables are related to Y, you should get
slightly better preservation of covariances than with independent
hotdecks.  There are many variants on this theme.   You will still

have

some fading of multivariate structure, however.  And you will

under-estimate post-imputation variances.

For combining hotdecks with multiple imputation, see the exciting new
papers by Siddique and Belin and by Little, Yosef, Cain, Nan, and
Harlow, both in the first issue of volume 27 of Statistics in

Medicine.

--Dave

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alan
Zaslavsky
Sent: Wednesday, January 02, 2008 10:07 AM
To: impute@lists.utsouthwestern.edu; [EMAIL PROTECTED]
Subject: [Impute] Rounding option on PROC MI and choosing a final MI
dataset

From: "Raquel Hampton" <[EMAIL PROTECTED]>
Subject: [Impute] Rounding option on PROC MI and choosing a final MI
        dataset
My first question is: there is a round option for PROC MI, but I read

in

an article (Horton, N.J., Lipsitz, S.P., & Parzen, M. (2003). A
potential for bias when rounding in multiple imputation. The American
Statistician 57(4), 229-232) that using the round option for

categorical

data (the items have nominal responses, ranging from 1 to 5) produces
bias estimates, though logical.  So what can be done? I only have

access

to SAS and STATA, but I am not very familar with STATA.  Will this

not

be such a problem since the proportion of missing for each individual
item is small?

Do you really mean nominal (unordered categories, like French, German,
English, or chocolate, vanilla, strawberry) or ordinal (like poor,

fair,

good, excellent)?  If nominal, you won't get anything sensible by
fitting
a normal model and rounding.  If ordinal and well distributed across

the

categories, the bias of using rounded data will be less than with the
binomial data primarily considered by the Horton et al. article.

You might also consider whether it is necessary to round at all --
depends on how the data will be used in further analyses.

With only a couple of percent missing on each item, all of the issuesabout imputation become less crucial, although as noted in a previous

response you should definitely run the proper MI analysis to verify

that

the between-imputation contribution to variance is small.  In practice
any modeling exercise is a compromise involving putting more effort

into

the important aspects of the modeling and in this case this might not
require doing the most methodologically advanced things with the
imputation.

_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute

_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute

Re: [Impute] Rounding option on PROC MI and choosing a final MI dataset

Reply via email to