Venita: you are confusing the standard deviation of a distribution of a 
random variable (a parameter estimate), and the standard error of an 
estimate (an estimate of precision of an parameter estimate). The MI 
formula to which you refer is for computing se's of estimates, but for 
summarizing the distribution the sd, not the se, is the right measure. 
(With complete data, if you were comparing spreads of distributions 
estimated with different sample sizes, you would not want the se as the 
measure of spread since it depends on the sample size).

The sd is a parameter like a mean or a regression coefficient, and the MI 
estimate of it is the average of the estimates from each data set. Rod

  On Fri, 22 
Oct 2004, DePuy, Venita wrote:

> Hi Rod et al-
> The within-imputation variance is just the mean of the variances from the M
> imputed data sets, but Rubin(1987) also gives a formula for the
> between-imputation variance for the estimate as the variance of the
> estimates . . ie if your point estimate of interest is Q and you've
> calculated Qbar as the mean of the M different Q's, the between imputation
> variance B is  1/(M-1)* sum(Q-Qbar)^2 . . then the total variance associated
> with Qbar is the average variance estimate plus (1+ 1/M)*B . . .
>
> Although personally I just use SAS's MIanalyze to do all that stuff.  But my
> question - if you're reporting full imputed values, why give the standard
> deviations, which are too low, instead of the standard errors, which include
> all the error?
>
> Thanks!
> -Venita
>
>> ----------
>> From:        Rod Little[SMTP:[email protected]]
>> Sent:        Friday, October 22, 2004 9:18 AM
>> To:  Howells, William
>> Cc:  DePuy, Venita; Balasubramani, G.K. ; [email protected]
>> Subject:     RE: [Impute] Multiply Imputation - Descriptive Stats
>>
>> If you want to report the means and standard deviations, you can just
>> average the means and standard deviations from the M imputed data sets.
>> This is more efficient than reporting values for one data set, and
>> averaging the imputes before computing the statistics will lead to an
>> underestimate of the standard deviation (as when conditional means are
>> imputed). Rod
>>
>>   On Thu, 21 Oct 2004, Howells, William wrote:
>>
>>> We've wondered about this ourselves and I haven't seen it covered in any
>>> text.  We also opted for reporting baseline stats on unimputed data
>>> because our missing data is mainly in one predictor variable, and
>>> indicate the observed n in a footnote or the table itself.  Bill
>>> Howells, Wash U Med School, St Louis
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of DePuy,
>>> Venita
>>> Sent: Thursday, October 21, 2004 11:19 AM
>>> To: 'Balasubramani, G.K. '; ''[email protected]' '
>>> Subject: RE: [Impute] Multiply Imputation - Descriptive Stats
>>>
>>> Hi Bala et al -
>>>
>>> In the varous MI papers we work on in my group, we typically provide
>>> baseline descriptive stats for the unimputed group.  If that is not an
>>> option, consider using either the first imputed sample or the overall
>>> imputated values.  The overall MI mean for a value is merely the mean of
>>> the
>>> 5 (or however many) means, one from each dataset.
>>>
>>> However, you typically want to reporta measure of variance.  For the
>>> unimputed or 1st imputed sample method, you can just use std dev.  For
>>> the
>>> overall imputed values, you need to use standard errors.
>>>
>>> Personally, I prefer using unimputed for the baseline descriptives and
>>> full
>>> imputation values in subsequent analyses . . . but I would say the main
>>> deciding factor is the amount of missingness in your data.  If it's very
>>> large, you will probably want to use imputed values.
>>>
>>> Hope this helps!
>>> Venita
>>>
>>> -----Original Message-----
>>> From: Balasubramani, G.K.
>>> To: '[email protected]'
>>> Sent: 10/21/2004 12:05 PM
>>> Subject: [Impute] Multiply Imputation - Descriptive Stats
>>>
>>> Hello all,
>>>
>>>
>>>
>>> This is a basic question in relation to imputation. That is, the imputed
>>> data is an outcome variable, which is Hamilton depression rating scale.
>>> I am using the threshold to create an indicator of remission or not
>>> remission. After I imputed the data (say for 5 times) , how do I show
>>> the descriptive statistics?  That is, the percentage with remission when
>>> data include imputed values.  (Ex. Sex with remission , Employment
>>> status with remission, etc..). Can I take the mean of the 5 imputed data
>>> sets to create the indicator variable for remission? Is there any other
>>> way to present the descriptive using the imputed data?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Bala
>>>
>>> <<ATT93287.txt>>
>>>
>>> _______________________________________________
>>> Impute mailing list
>>> [email protected]
>>> http://lists.utsouthwestern.edu/mailman/listinfo/impute
>>>
>>> _______________________________________________
>>> Impute mailing list
>>> [email protected]
>>> http://lists.utsouthwestern.edu/mailman/listinfo/impute
>>>
>>>
>>>
>>
>> __________________________________________________________________________
>> _________
>> Roderick Little
>> Richard D. Remington Collegiate Professor of Biostatistics
>> U-M School of Public Health                 Tel (734) 936 1003
>> M4045 SPH II                                Fax (734) 763 2215
>> 1420 Washington Hgts                        email [email protected]
>> Ann Arbor, MI 48109-2029             http://www.sph.umich.edu/~rlittle/
>>
>
>
>

___________________________________________________________________________________
Roderick Little
Richard D. Remington Collegiate Professor of Biostatistics 
U-M School of Public Health                 Tel (734) 936 1003
M4045 SPH II                                Fax (734) 763 2215 
1420 Washington Hgts                        email [email protected]
Ann Arbor, MI 48109-2029             http://www.sph.umich.edu/~rlittle/

Reply via email to