David Judkins wrote:
> Frank,
> 
> Well, I am glad that conditioned my statement to refer to software known
> to me.  
> 
> This past summer, some co-workers and I presented some testing on a
> really pathological joint distribution.  Would you be interested in
> testing your aregImpute function on it?
> 
> --Dave  

Yes

Thanks Dave
Frank

> 
> -----Original Message-----
> From: Frank E Harrell Jr [mailto:[email protected]] 
> Sent: Wednesday, January 02, 2008 11:48 AM
> To: David Judkins
> Cc: Alan Zaslavsky; [email protected];
> [email protected]
> Subject: Re: [Impute] Rounding option on PROC MI and choosing a final MI
> dataset
> 
> David Judkins wrote:
>> Raquel,
>>
>> Your problem is typical of the class of problems that I have been
>> working on for about 15 years now.  You can look up my imputation
> papers
>> in the CIS.   None of the currently available (free or marketed)
>> software solutions known to me are designed to preserve the structure
> of
>> general multivariate data.  The ones that build models of multivariate
>> relationships are mostly designed for either normal or binary data.
>> Programs designed for general data are usually designed to impute a
>> single variable at a time and generally fail to preserve multivariate
>> structure.  If you have the luxury of a large programming budget, you
>> could program the algorithms that some of us here at Westat have
>> developed and published.  
> 
> David,
> 
> In theory you are correct, but I think your note slightly misses the 
> point.  It is amazing how well the chained equations approach of MICE 
> and my aregImpute function work, given they were not designed to 
> preserve the multivariate structure.  And they make fewer assumptions. 
> I am particularly dubious about any methods that assume linearity and 
> multivariate normality.
> 
> aregImpute uses Fisher's optimum scoring algorithm to impute nominal 
> variables.  If predictive mean matching is used with aregImpute (a more 
> nonparametric approach not available with your multivariate approach), 
> the distribution of imputed categories is quite sensible.
> 
> Frank Harrell
> 
>> As Alan replied, however, given that all your individual item rates
> are
>> low, perhaps one of the available solutions would work reasonably well
>> for you.  
>>
>> It sounds as if you don't have any skip patterns.  If so, you could
> just
>> impute the mode for each variable.  A second solution that is only a
>> little more complicated would be to independently impute each variable
>> by a simple hotdeck.  Either way, you end up with 100% complete
> vectors.
>> You don't have to do any rounding.  All variables have permissible
>> values. You will have better marginal distributions with independent
>> hotdecks than you get by imputing modes.  
>>
>> But neither solution protects multivariate structure.  Here is a bit
>> more complicated solution that tries to do that but is still fairly
>> simple:
>>
>> Pick a single variable as the most important for your analyses.  Call
> it
>> Y.  Let S be the maximum set of variables with zero item nonresponse.
>> Build the best model for Y in terms of S that you can.  (Doesn't have
> to
>> be a linear model.)  Output predicted values of Y for the whole
> sample.
>> Call them Ypred.  Let O be the maximum set of cases with zero
>> nonresponse on all variables.  Find the nearest neighbor in O for each
>> case with one or more missing values.  So then you have a donor case
> and
>> a recipient case.  Let X1i,...,Xpi be the set of variables on
> recipient
>> case i with missing values.  Let X1j,...,Xpj be the corresponding set
> of
>> variables on the donor case.  Impute Xki=Xkj for k=1,...,p.  
>>
>> To the extent that the variables in S are good predictors of Y and to
>> the extent that the other variables are related to Y, you should get
>> slightly better preservation of covariances than with independent
>> hotdecks.  There are many variants on this theme.   You will still
> have
>> some fading of multivariate structure, however.  And you will
>> under-estimate post-imputation variances. 
>>
>> For combining hotdecks with multiple imputation, see the exciting new
>> papers by Siddique and Belin and by Little, Yosef, Cain, Nan, and
>> Harlow, both in the first issue of volume 27 of Statistics in
> Medicine.
>>
>>
>> --Dave  
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Alan
>> Zaslavsky
>> Sent: Wednesday, January 02, 2008 10:07 AM
>> To: [email protected]; [email protected]
>> Subject: [Impute] Rounding option on PROC MI and choosing a final MI
>> dataset
>>
>>
>>> From: "Raquel Hampton" <[email protected]>
>>> Subject: [Impute] Rounding option on PROC MI and choosing a final MI
>>>     dataset
>>> My first question is: there is a round option for PROC MI, but I read
>> in
>>> an article (Horton, N.J., Lipsitz, S.P., & Parzen, M. (2003). A
>>> potential for bias when rounding in multiple imputation. The American
>>> Statistician 57(4), 229-232) that using the round option for
>> categorical
>>> data (the items have nominal responses, ranging from 1 to 5) produces
>>> bias estimates, though logical.  So what can be done? I only have
>> access
>>> to SAS and STATA, but I am not very familar with STATA.  Will this
> not
>>> be such a problem since the proportion of missing for each individual
>>> item is small?
>> Do you really mean nominal (unordered categories, like French, German,
>> English, or chocolate, vanilla, strawberry) or ordinal (like poor,
> fair,
>> good, excellent)?  If nominal, you won't get anything sensible by
>> fitting
>> a normal model and rounding.  If ordinal and well distributed across
> the
>> categories, the bias of using rounded data will be less than with the
>> binomial data primarily considered by the Horton et al. article.
>>
>> You might also consider whether it is necessary to round at all --
>> depends on how the data will be used in further analyses.
>>
>> With only a couple of percent missing on each item, all of the issues 
>> about imputation become less crucial, although as noted in a previous
>> response you should definitely run the proper MI analysis to verify
> that
>> the between-imputation contribution to variance is small.  In practice
>> any modeling exercise is a compromise involving putting more effort
> into
>> the important aspects of the modeling and in this case this might not
>> require doing the most methodologically advanced things with the
>> imputation.
>>
>> _______________________________________________
>> Impute mailing list
>> [email protected]
>> http://lists.utsouthwestern.edu/mailman/listinfo/impute
>>
>> _______________________________________________
>> Impute mailing list
>> [email protected]
>> http://lists.utsouthwestern.edu/mailman/listinfo/impute
>>
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Reply via email to