Raghu -- This is a great suggestion, thank you! What surprises me here is the suggestion that a chained-equation approach can solve this problem, but a multivariate normal approach cannot. I had thought the two were equivalent for normal data. It seems like a substantial advantage if the chained-equation approach can handle more difficult patterns of missingness.
Can you say a little about what gives the chained-equation approach this advantage? Best wishes, Paul von Hippel On Tue, Sep 20, 2011 at 12:12 PM, Raghunathan, Trivellore <[email protected] > wrote: > There are two possible ways to conceptualize this problem and use one of > the MI software. Suppose that R stands for reading and M stands for math. F, > W, S stands for Fall Winter and Spring and number stands for the year. > > Option 1: Arrange the data as > > Subject-A RF1 RW1 RS1 MF1 MW1 MS1 RF2 RW2 RS2 MF2 MW2 MS2 > Subject-B RF2 RW2 RS2 MF2 > MW2 MS2 RF3 RW3 RS3 MF3 MW3 MS3 > Subject-C > Subject-D > > This approach will create a n x 72 completed data matrix. You can drop the > imputations in the non-administered portion of the data set for some > analysis or retain them, especially, in cross sectional analysis. The > partial correlation between ab1 and cd3 will be practically zero when > IVEware is used. We have tested this by using IVEware on "file-matching" > pattern of missing data. > > Option 2: > > Though not sure, one may be able to use the following structure under some > assumptions: > > Subject A: RF1 RW1 RS1 MF1 MW1 MS1 RF2 RW2 RS2 MF2 MW2 MS2 Year=1.5 > Subject B: RF2 RW2 RS2 MF2 MW2 MS2 RF3 RW3 RS3 MF3 MW3 MS3 Year=2.5 > Subject C: RF3 RW3 RS3 MF3 MW3 MS3 RF4 RW4 RS4 MF4 MW4 MS4 Year=3.5 > > Use year as a covariate and possibly some interactions. This makes > assumptions about the stability of regression relationship over time and the > residual covariance matrix has a common 12 by 12 block diagonal matrices. > > My own preference is to use the Option 1 if the sample size is large and > use Option 2 is the sample size is small. > > Interesting problem. > > Raghu > > > *From:* Impute -- Imputations in Data Analysis [ > [email protected]] on behalf of Paul von Hippel [ > [email protected]] > *Sent:* Tuesday, September 20, 2011 10:53 AM > > *To:* [email protected] > *Subject:* Re: Imputing panel data, constraining correlations at long lags > > Thanks, I thought a little about this. It's not obvious to me what the > prior would be. Any recommendations? > > On Tue, Sep 20, 2011 at 9:41 AM, Juned Siddique <[email protected] > > wrote: > >> Hi Paul,**** >> >> ** ** >> >> If you use a Bayesian approach like Proc MI for the problem below, the >> posterior correlation between wave 1 and 3 is just the prior correlation. So >> one approach might be to use an informative prior for the covariance matrix >> which you can do in Proc MI.**** >> >> ** ** >> >> -Juned**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> *From:* Impute -- Imputations in Data Analysis [mailto: >> [email protected]] *On Behalf Of *Paul von Hippel >> *Sent:* Tuesday, September 20, 2011 8:21 AM >> >> *To:* [email protected] >> *Subject:* Re: Imputing panel data, constraining correlations at long >> lags**** >> >> ** ** >> >> Thanks, Dave. You've come up with a nicely simplified version of my >> problem. Suppose I had only three waves of data, with every subject missing >> either wave 1 (your pattern A) or wave 3 (your pattern B). Ordinarily I >> would put the data in wide format -- **** >> >> ** ** >> >> A O1 O2 M3**** >> >> B M1 O2 O3**** >> >> ** ** >> >> -- and impute using a multivariate normal model. However, I don't think >> that would work in this case because the MVN model would want to estimate >> the correlation between wave 1 and wave 3, and there are no cases where both >> wave 1 and wave 3 are observed.**** >> >> ** ** >> >> However, if I could tell the software that this was, say, an AR(1) process >> -- or, equivalently, that partial correlation between waves 1 and 3 is zero >> -- I'd be in business.**** >> >> ** ** >> >> This could be done using MVN software that allowed me to impose >> constraints on the covariance matrix, or imputation software for serially >> correlated data. Does such software exist?**** >> >> ** ** >> >> Best,**** >> >> Paul**** >> >> ** ** >> >> ** ** >> ------------------------------ >> >> *From:* David Judkins <[email protected]> >> *To:* [email protected] >> *Sent:* Tuesday, September 20, 2011 7:25 AM >> *Subject:* Re: Imputing panel data, constraining correlations at long >> lags**** >> >> Paul,**** >> >> **** >> >> This sounds pretty challenging. Reminds me of Andrew Gelman's JSM talk >> and 1998 JASA paper on imputation of questions not asked. **** >> >> **** >> >> It also reminds me of a remark some speaker made this year at JSM about >> almost all natural processes being Markov chains. Not sure I buy that, but I >> think he meant that if you have a rich enough state vector, then one past >> observation is all you need. Of course, that would be trivially true if the >> state vector contained lagged latent values. In this case,I doubt your >> state vector is rich enough to compensate for the brevity of the >> student-level time series, but I guess you have to work with what you have. >> **** >> >> **** >> >> Whatever you do I imagine will involve a lot of custom programming. >> However, you might be able to Raghu's IVEware on a series of specially >> reshaped versions of your data. For example, to impute year 3 for subject a >> and year 1 for subject B, you might create a a dataset with only A and B >> records in it shaped like this:**** >> >> **** >> >> A O1 O2 M3**** >> >> B M1 O2 O3**** >> >> **** >> >> Once that was done, you could proceed to imputing Year 4 on A and B >> records and Year 2 on C records with a dataset shaped from B and C records >> as**** >> >> **** >> >> A O2 I3 M4**** >> >> B O2 O3 M4**** >> >> C M2 O3 O4**** >> >> **** >> >> And so on. At the end of that, you would have 4 observed/imputed years >> per subject. **** >> >> **** >> >> There should then be a way to generalize to more than 4 per subject. Not >> very elegant, but it might work.**** >> >> **** >> >> --Dave**** >> ------------------------------ >> >> *From:* Impute -- Imputations in Data Analysis [ >> [email protected]] on behalf of Paul von Hippel [ >> [email protected]] >> *Sent:* Monday, September 19, 2011 5:58 PM >> *To:* [email protected] >> *Subject:* Imputing panel data, constraining correlations at long lags*** >> * >> >> I have panel data where different students are tested for overlapping >> 2-year periods. **** >> >> - Subject A is observed for years 1 & 2. **** >> - Subject B is observed for years 2 & 3. **** >> - Subject C is observed for years 3 & 4. **** >> - etc up to year 12 (of school)**** >> >> For each observed year there are three separate test occasions (fall, >> winter, spring) and two subjects (reading, math). >> >> It seems to me I can impute the missing test scores provided I am willing >> to assume something about lags that are 2 years are longer. For example, I >> could assume that the partial correlation at lags of 2 years or longer is >> zero. This is not an unreasonable assumption since the correlations at >> shorter lags are very strong (.8-.9). >> >> Is there software that will allow me to do this conveniently? >> >> My usual strategy is to reshape the data from long to wide and then impute >> using a multivariate normal model. There are several packages that will >> permit this; however, I am not aware of software that will let me constrain >> the covariance matrix in the way I have described. >> >> I have not used imputation software that are tailored for panel data -- >> such as Schafer et al's PAN package, recently ported from S-Plus to R. I >> could try that, provided there is a convenient way to restrict the long >> lags. >> >> Thanks! >> >> -- >> Best wishes, >> Paul von Hippel >> Assistant Professor >> LBJ School of Public Affairs >> Sid Richardson Hall 3.251 >> University of Texas, Austin >> 2315 Red River, Box Y >> Austin, TX 78712 >> >> mobile, preferred (614) 282-8963 >> office (512) 232-3650**** >> >> ** ** >> > > > > -- > Best wishes, > Paul von Hippel > Assistant Professor > LBJ School of Public Affairs > Sid Richardson Hall 3.251 > University of Texas, Austin > 2315 Red River, Box Y > Austin, TX 78712 > > mobile, preferred (614) 282-8963 > office (512) 232-3650 > -- Best wishes, Paul von Hippel Assistant Professor LBJ School of Public Affairs Sid Richardson Hall 3.251 University of Texas, Austin 2315 Red River, Box Y Austin, TX 78712 mobile, preferred (614) 282-8963 office (512) 232-3650
