This would also introduce bias if there is any trend or pattern. I would prefer the strung-out format and use multivariate technique to impute the missing values.
Raghu Sent from my Verizon Wireless BlackBerry -----Original Message----- From: David Judkins <[email protected]> Sender: Impute -- Imputations in Data Analysis <[email protected]> Date: Tue, 14 Dec 2010 13:25:09 To: [email protected]<[email protected]> Reply-To: Impute -- Imputations in Data Analysis <[email protected]> Subject: Re: Imputation of short personal timeseries Sounds like it might help the next time. I can do something a little like that in AutoImpute now. In it, I have an option where the donor is strongly (but not absolutely) constrained to have the same value on some variable. With the stacked format, I could set this "ForceVariable" equal to the person ID. This would have the effect of almost always filling in a missing time point from a reported time point by the same person. But I am concerned that this would lead to too little variability in each person's imputed history. -----Original Message----- From: Impute -- Imputations in Data Analysis [mailto:[email protected]] On Behalf Of Raghunathan, Trivellore Sent: Tuesday, December 14, 2010 12:00 PM To: [email protected] Subject: Re: Imputation of short personal timeseries Stacked format should do ok as well but it will underestimate (possibly, biased) the covariance matrix used to pertrurb the regression coefficients in MICE and IVEWARE. The point estimates will be unbiased. I think MLWIN can handle random effects model for imputation. PAN developed by Joe Schafer might be another option. We are currently modifying IVEWARE to incorporate clustering by treating individuals as clusters and using Jackknife to estimate the regression coefficients and its covariance matrix. We are also working on random coefficeint models in the sequential regression approach in IVEWARE. These won't solve your problem now! Raghu ________________________________________ From: Impute -- Imputations in Data Analysis [[email protected]] On Behalf Of David Judkins [[email protected]] Sent: Tuesday, December 14, 2010 11:46 AM To: [email protected] Subject: Re: Imputation of short personal timeseries Yes, that is my first thought. I will call this the strung-out format as opposed to the stacked format. I am a little concerned about overfit with the strung-out format given the very large number of potential predictor variables for each variable. (We are imputing several dozen related series simultaneously.) With the stacked format, I think there would be less danger of overfit, but I worry that it would result in too much within-person variation over time. Ideally, I would base the imputations on personal growth models with time-varying covariates, but this seems like a tall order. I guess if any software can do it, MPLUS would be a good candidate. -----Original Message----- From: Impute -- Imputations in Data Analysis [mailto:[email protected]] On Behalf Of Raghunathan, Trivellore Sent: Tuesday, December 14, 2010 11:29 AM To: [email protected] Subject: Re: Imputation of short personal timeseries One option is to create one row per person by stringing the data from multiple waves and then MICE or IVEWARE can be used to impute jointly all the missing values in all the variables. Raghu ________________________________________ From: Impute -- Imputations in Data Analysis [[email protected]] On Behalf Of Juned Siddique [[email protected]] Sent: Tuesday, December 14, 2010 10:05 AM To: [email protected] Subject: Re: Imputation of short personal timeseries Hi Dave, Are the surveys repeated measurements on the same individuals? If so, you might want to look into Mplus. -Juned From: Impute -- Imputations in Data Analysis [mailto:[email protected]] On Behalf Of David Judkins Sent: Friday, December 10, 2010 9:12 AM To: [email protected] Subject: Imputation of short personal timeseries Does MICE or IVEware or some other package have special procedures for imputing missing waves in short time series of binary and/or ordered Likert item responses? I have a survey with 9 waves of data collection. Thinks like quarterly binary flags for alcohol consumption and Likert questions about severity of problems caused by alcohol consumption. I know that there was some research on this issue in connection with SIPP. There was a pair of JSM papers on the subject back in 1994. One paper was by my colleagues Rizzo, Kalton, and Brick. Another, by my former colleagues Folsom and Witt. But I am wondering if there is something more recent, as well as something more automated. --Dave David Judkins Senior Scientist Westat 1650 Research Boulevard Rockville, MD 20850 (301) 315-5970 [email protected]<mailto:[email protected]>
