[email protected] wrote:
> Good points.
> 
> I don't really have a note, just a proceedings paper on conditions for 
> nearest neighbour imputation to lead to unbiased estimation, but not pursued 
> further; 1999 actually, not 1998. Predictive mean matching is a form of 
> nearest neighbour imputation. Predicting the mean is a rescaling and matching 
> applies the distance function.
> 

Regarding a note I just meant our e-mail exchange.

For the moment I'm wondering about the bias in standard errors, more so 
than bias in the regression coefficients.

> I think PMM can be viewed as a regression with an added residual because the 
> imputed value in not directly "on the regression line", that is, not the 
> expected value, but the expected value plus something (only an exact match 
> would lead to no residual: unlikely and even more with high nonresponse 
> rates). Also, I guess I was implicitely assuming that the variance is 
> non-constant, and with an impact significant enough to be seen in high 
> nonresponse cells. Maybe the variance can be assumed constant, but at least a 
> residual should be assumed to be present for variance estimation.

I think I see your point, but Rubin's formula for variance just looks at 
variation of the regression coefficients across multiple imputations, 
plus the usual variance, so I can't see where there is an opportunity to 
correct for imperfect matching.

> 
> With nearest neighbour imputation, Burns (1990) had seen problems with using 
> the first two neighbours for variance estimation and we had reasonable 
> results assuming it is a regression with added residuals in Lee, Rancourt and 
> S?rndal (1994). Maybe these apply only for NN and not for PMM, but I thought 
> I might provide a lead...

If you are going to add residuals, I think you might as well stick with 
regression imputation and omit the PMM step.

> 
> I have no problem with the posting of these exchanges.

Thanks Eric,
Frank

> 
> Eric
> 
> -----Message d'origine-----
> De : Frank E Harrell Jr [mailto:[email protected]] 
> Envoy? : 16 janvier 2007 15:09
> ? : [email protected]
> Objet : Re: RE : [Impute] SEs of regression coefficients after predictive 
> meanmatching
> 
> 
> [email protected] wrote:
>> For any variation of donor method, there has to be compactedness (this 
>> is usually reached with a not-to-frequent nonresponse) for the method 
>> to lead to asymptotic unbiasedness. Often I have found that pretending 
>> we are in presence of regression does a good job when data are 
>> compact. Otherwise, the implicitely-imputed added residuals have to be 
>> boosted for variance calculation purposes. It is like unit i does not 
>> receive the right residual, but rather residual e sub j from unit j 
>> the donor (which is at a different point on the "x line" and therefore 
>> has a sligntly different conditional distribution). So the residual 
>> has to be increased by the difference between the expected values of y 
>> at points i and j. I don't recall much on this in the literature, but 
>> I have an embryo of it in the 1998 ASA SRMS proceedings.
>>
>> Eric Rancourt
>> Statistics Canada
> 
> Thanks very much for your note.  When you are ready would you mind 
> posting your note and this response to the list?
> 
> There are two things unclear about your note.  First, PMM does not use 
> residuals in any way but PMM does need to inherit uncertainty in the 
> regression equation used for predicting the target variable.  Second, 
> the conditional distribution of the target might be assumed to have 
> constant variance, so the residuals should be exchangeable.  For 
> non-large sample sizes the residuals actually have some correlation and 
> non-equal variance but I think these can be ignored.  So I'm not clear 
> on why you would need to talk about position on the x line.
> 
> Thanks for the discussion and ideas,
> Frank
> 
>> -----Message d'origine-----
>> De : [email protected] 
>> [mailto:[email protected]] De la part de Frank E 
>> Harrell Jr Envoy? : 16 janvier 2007 12:41 ? : 
>> [email protected] Objet : [Impute] SEs of regression 
>> coefficients after predictive meanmatching
>>
>>
>> In one set of simulation experiments I am finding that the Rubin
>> variance-covariance formula works very well for regression imputation 
>> but that the standard error of the final regression coefficient for a 
>> frequently missing target variable is very much underestimated if PMM is 
>> used.  Does anyone have experience with this or know of a pertinent 
>> reference?  In doing PMM I have used both the closest match as part of 
>> the random-draw multiple imputation algorithm, and I have also tried 
>> weighted sampling where the closest match has the highest probability of 
>> being selected but donors around the closest may be selected with 
>> decreasing probability as they are farther away from the closest match. 
>>   Missingness of the target variable is moderately strongly related to 
>> observed values of another covariate (that has no missings).
>>
>> Thanks
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Reply via email to