Raghu -- This is a great suggestion, thank you!

What surprises me here is the suggestion that a chained-equation approach
can solve this problem, but a multivariate normal approach cannot. I had
thought the two were equivalent for normal data. It seems like a substantial
advantage if the chained-equation approach can handle more difficult
patterns of missingness.

Can you say a little about what gives the chained-equation approach this
advantage?

Best wishes,
Paul von Hippel

On Tue, Sep 20, 2011 at 12:12 PM, Raghunathan, Trivellore <[email protected]
> wrote:

>  There are two possible ways to conceptualize this problem and use one of
> the MI software. Suppose that R stands for reading and M stands for math. F,
> W, S stands for Fall Winter and Spring and number stands for the year.
>
> Option 1: Arrange the data as
>
> Subject-A  RF1 RW1 RS1 MF1 MW1 MS1 RF2 RW2 RS2 MF2 MW2 MS2
> Subject-B                                                 RF2 RW2 RS2 MF2
> MW2 MS2 RF3 RW3 RS3 MF3 MW3 MS3
> Subject-C
> Subject-D
>
> This approach will create a n x 72 completed data matrix. You can drop the
> imputations in the non-administered portion of the data set for some
> analysis or retain them, especially, in cross sectional analysis. The
> partial correlation between ab1 and cd3 will be practically zero when
> IVEware is used. We have tested this by using IVEware on "file-matching"
> pattern of missing data.
>
> Option 2:
>
> Though not sure, one may be able to use the following structure under some
> assumptions:
>
> Subject A:   RF1 RW1 RS1 MF1 MW1 MS1 RF2 RW2 RS2 MF2 MW2 MS2 Year=1.5
> Subject B:   RF2 RW2 RS2 MF2 MW2 MS2 RF3 RW3 RS3 MF3 MW3 MS3 Year=2.5
> Subject C:   RF3 RW3 RS3 MF3 MW3 MS3 RF4 RW4 RS4 MF4 MW4 MS4 Year=3.5
>
> Use year as a covariate and possibly some interactions. This makes
> assumptions about the stability of regression relationship over time and the
> residual covariance matrix has a common 12 by 12 block diagonal matrices.
>
> My own preference is to use the Option 1 if the sample size is large and
> use Option 2 is the sample size is small.
>
> Interesting problem.
>
> Raghu
>
>
>  *From:* Impute -- Imputations in Data Analysis [
> [email protected]] on behalf of Paul von Hippel [
> [email protected]]
> *Sent:* Tuesday, September 20, 2011 10:53 AM
>
> *To:* [email protected]
> *Subject:* Re: Imputing panel data, constraining correlations at long lags
>
>  Thanks, I thought a little about this. It's not obvious to me what the
> prior would be. Any recommendations?
>
> On Tue, Sep 20, 2011 at 9:41 AM, Juned Siddique <[email protected]
> > wrote:
>
>>  Hi Paul,****
>>
>> ** **
>>
>> If you use a Bayesian approach like Proc MI for the problem below, the
>> posterior correlation between wave 1 and 3 is just the prior correlation. So
>> one approach might be to use an informative prior for the covariance matrix
>> which you can do in Proc MI.****
>>
>> ** **
>>
>> -Juned****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* Impute -- Imputations in Data Analysis [mailto:
>> [email protected]] *On Behalf Of *Paul von Hippel
>> *Sent:* Tuesday, September 20, 2011 8:21 AM
>>
>> *To:* [email protected]
>>   *Subject:* Re: Imputing panel data, constraining correlations at long
>> lags****
>>
>>    ** **
>>
>> Thanks, Dave. You've come up with a nicely simplified version of my
>> problem. Suppose I had only three waves of data, with every subject missing
>> either wave 1 (your pattern A) or wave 3 (your pattern B). Ordinarily I
>> would put the data in wide format -- ****
>>
>> ** **
>>
>> A O1 O2 M3****
>>
>> B M1 O2 O3****
>>
>> ** **
>>
>> -- and impute using a multivariate normal model. However, I don't think
>> that would work in this case because the MVN model would want to estimate
>> the correlation between wave 1 and wave 3, and there are no cases where both
>> wave 1 and wave 3 are observed.****
>>
>> ** **
>>
>> However, if I could tell the software that this was, say, an AR(1) process
>> -- or, equivalently, that partial correlation between waves 1 and 3 is zero
>> -- I'd be in business.****
>>
>> ** **
>>
>> This could be done using MVN software that allowed me to impose
>> constraints on the covariance matrix, or imputation software for serially
>> correlated data. Does such software exist?****
>>
>> ** **
>>
>> Best,****
>>
>> Paul****
>>
>> ** **
>>
>> ** **
>>    ------------------------------
>>
>> *From:* David Judkins <[email protected]>
>> *To:* [email protected]
>> *Sent:* Tuesday, September 20, 2011 7:25 AM
>> *Subject:* Re: Imputing panel data, constraining correlations at long
>> lags****
>>
>> Paul,****
>>
>>  ****
>>
>> This sounds pretty challenging.  Reminds me of Andrew Gelman's JSM talk
>> and 1998 JASA paper on imputation of questions not asked.  ****
>>
>>  ****
>>
>> It also reminds me of a remark some speaker made this year at JSM about
>> almost all natural processes being Markov chains. Not sure I buy that, but I
>> think he meant that if you have a rich enough state vector, then one past
>> observation is all you need.  Of course, that would be trivially true if the
>> state vector contained lagged latent values.   In this case,I doubt your
>> state vector is rich enough to compensate for the brevity of the
>> student-level time series, but I guess you have to work with what you have.
>> ****
>>
>>  ****
>>
>> Whatever you do I imagine will involve a lot of custom programming.
>> However, you might be able to Raghu's IVEware on a series of specially
>> reshaped versions of your data.  For example, to impute year 3 for subject a
>> and year 1 for subject B, you might create a a dataset with only A and B
>> records in it shaped like this:****
>>
>>  ****
>>
>> A O1 O2 M3****
>>
>> B M1 O2 O3****
>>
>>  ****
>>
>> Once that was done, you could proceed to imputing Year 4 on A and B
>> records and Year 2 on C records with a dataset shaped from B and C records
>> as****
>>
>>  ****
>>
>> A O2 I3 M4****
>>
>> B O2 O3 M4****
>>
>> C M2 O3 O4****
>>
>>  ****
>>
>> And so on.  At the end of that, you would have 4 observed/imputed years
>> per subject.  ****
>>
>>  ****
>>
>> There should then be a way to generalize to more than 4 per subject.  Not
>> very elegant, but it might work.****
>>
>>  ****
>>
>> --Dave****
>>   ------------------------------
>>
>> *From:* Impute -- Imputations in Data Analysis [
>> [email protected]] on behalf of Paul von Hippel [
>> [email protected]]
>> *Sent:* Monday, September 19, 2011 5:58 PM
>> *To:* [email protected]
>> *Subject:* Imputing panel data, constraining correlations at long lags***
>> *
>>
>> I have panel data where different students are tested for overlapping
>> 2-year periods. ****
>>
>>    - Subject A is observed for years 1 & 2. ****
>>    - Subject B is observed for years 2 & 3. ****
>>    - Subject C is observed for years 3 & 4. ****
>>    - etc up to year 12 (of school)****
>>
>> For each observed year there are three separate test occasions (fall,
>> winter, spring) and two subjects (reading, math).
>>
>> It seems to me I can impute the  missing test scores provided I am willing
>> to assume something about lags that are 2 years are longer. For example, I
>> could assume that the partial correlation at lags of 2 years or longer is
>> zero. This is not an unreasonable assumption since the correlations at
>> shorter lags are very strong (.8-.9).
>>
>> Is there software that will allow me to do this conveniently?
>>
>> My usual strategy is to reshape the data from long to wide and then impute
>> using a multivariate normal model. There are several packages that will
>> permit this; however, I am not aware of software that will let me constrain
>> the covariance matrix in the way I have described.
>>
>> I have not used imputation software that are tailored for panel data --
>> such as Schafer et al's PAN package, recently ported from S-Plus to R. I
>> could try that, provided there is a convenient way to restrict the long
>> lags.
>>
>> Thanks!
>>
>> --
>> Best wishes,
>> Paul von Hippel
>> Assistant Professor
>> LBJ School of Public Affairs
>> Sid Richardson Hall 3.251
>> University of Texas, Austin
>> 2315 Red River, Box Y
>> Austin, TX  78712
>>
>> mobile, preferred (614) 282-8963
>> office (512) 232-3650****
>>
>> ** **
>>
>
>
>
> --
> Best wishes,
> Paul von Hippel
> Assistant Professor
> LBJ School of Public Affairs
> Sid Richardson Hall 3.251
> University of Texas, Austin
> 2315 Red River, Box Y
> Austin, TX  78712
>
> mobile, preferred (614) 282-8963
> office (512) 232-3650
>



-- 
Best wishes,
Paul von Hippel
Assistant Professor
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712

mobile, preferred (614) 282-8963
office (512) 232-3650

Reply via email to