Hi Everyone, 

I am new to multiple imputation, and would like to get your advice about a 
project that I am working on. I am interested in testing a model involving a 
latent interaction term using 2sls. My data are drawn from a 4-wave 13-year 
longitudinal study of married couples. The sample consists of 154 couples. Data 
for the early years of marriage were gathered at three annual intervals 
beginning when couples were newlyweds in 1980-1981. Follow-up data then were 
gathered on a fourth occasion in 1994-1995. 

Each phase of the study consisted of a primary face-to-face interview and a 
series of follow-up daily diary telephone interviews. Primary interviews were 
used to gather data on people’s perceptions of their partner’s traits and their 
feelings of marital satisfaction. The follow-up interviews were used to gather 
"quasi-observational" data on socioemotional behaviors in marriage, such as 
affection, negativity, and sexual expression. 

During the 13-year follow-up, data also were gathered on marital stability. In 
conjunction with the data on marital satisfaction, this made it possible to 
categorize couples into four marital outcome groups: happily married, unhappily 
married, early divorced, and later divorced. 

In my study, I would like to test whether people’s tendency to idealize their 
partner in the early years of marriage is associated with marital outcomes 13 
years after couples were first married. In my analysis, I want to use 2sls to 
regress a manifest measure of the extent to which people perceive that their 
intimate partner is pleasant to be around on a latent measure of the partner’s 
tendency to engage in pleasant behavior, a latent measure of the partner’s 
tendency to engage in unpleasant behavior, and the interaction between the two. 
Then I want to save the residual in prediction as a measure of idealization for 
use in subsequent analyses. The latent measure of pleasant behavior and the 
latent measure of unpleasant behavior both have four indicators. The latent 
interaction has 10 indicators – a scaling indicator that is formed by 
multiplying the scaling indicators of the latent measure of pleasant behavior 
and the latent measure of unpleasant behavior, and 9 additional
 indicators that are formed by multiplying each of the nonscaling indicators of 
the latent measure of pleasant behavior with each of the nonscaling indicators 
of the latent measure of unpleasant behavior. 

A 2sls analysis of the model described above consists of two steps. First, the 
scaling indicators for each latent are regressed on the non-scaling indicators 
for that latent as well as the non-scaling indicators from the other latents in 
the model and predicted values are obtained. Then the criterion is regressed on 
the predicted values for each of the scaling indicators. The basic idea is that 
because the non-scaling indicators are correlated with the scaling indicators, 
but not with the disturbance term in their measurement equations, they can be 
used to purge the scaling indicators of measurement error, thereby yielding 
estimates of the underlying latents.

Anyway, I’m trying to use multiple imputation to deal with missing observations 
in a data set that contains the variables that I will need to test my model for 
husbands and wives at three points in time during the early years of marriage. 
This data set consists of slightly more than 100 variables, roughly half of 
which are interaction terms. Ideally, I would like to be able to preserve 
associations in these variables across both gender and time. So my sense is 
that I need to avoid dividing my imputation data set into a number of smaller 
data sets (e.g., male variables at time 1, female variables at time 1, etc.). 

Unfortunately though, when I try to use this larger data set in both SAS Proc 
Mi and Norm I have trouble getting the EM algorithm to converge. In SAS, this 
data set also creates an error that causes the program to terminate. SAS isn’t 
too specific about the nature of the problem. It just says things like "invalid 
operation" and "generic error." So I’m hoping that people can give me some 
advice on what I should do next. 

The indicators for the latent predictors tend to be non-normal with a lot of 
positive skew. In addition, there are often one or two outliers in the high end 
of the distribution that are detached from the rest of the distribution. So, 
for example, if one looks at people’s reports of the average number of times 
per day that their spouse expresses physical affection toward them (e.g., 
kissing, hugging, cuddling), most scores are in the lower end of the 
distribution but there are some extremely high scores as well. 

So far, I’ve tried removing the extreme scores from the data but this doesn’t 
fix the problems with non-convergence and crashing that I get with the larger 
imputation data set. The next thing I’m planning to do is to transform the 
variables in the hope of attaining multivariate normality. My guess is that 
this will not be successful, but I’ll give it a try. My sense though is that 
the failure to attain multivariate normality may not be at the heart of the 
problem. According to the documentation for SAS Proc Mi this typically is not a 
problem unless a large percentage of the data are missing. My rate of missing 
information never exceeds 24%. Mind you, the SAS documentation doesn’t say what 
constitutes a large amount of missing data. So maybe this is a lot. 

Another possibility might be large discrepancies in variance among the 
variables. Some of the behaviors in my data occur much more frequently than do 
others. For example, positive behaviors such as expressing physical affection 
tend to occur more frequently than do negative behaviors such as showing anger 
or impatience by snapping, yelling, or raising one’s voice. So I was thinking 
that it might be possible to artificially increase the variance of some 
variables or decrease the variance of others. Maybe this could be accomplished 
by multiplying or dividing (or adding or subtracting) a constant. But I don’t 
know if this sort of thing is normally done, or if large discrepancies in the 
frequency at which different behaviors occur is even likely to be part of the 
problem.

At any rate, any advice that people could give me about what to do next would 
be greatly appreciated. Sorry for what is admittedly a long posting, but I felt 
that I would be most likely to get good advice if I took the time to give a 
detailed description of my project.

Thanks, 

Paul 



---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20030727/ab477799/attachment.htm
From link <@t> umich.edu  Mon Jul 28 15:44:28 2003
From: link <@t> umich.edu (Steve Peck)
Date: Sun Jun 26 08:25:00 2005
Subject: IMPUTE: Re: combine before instead of after?
References: <[email protected]>
Message-ID: <[email protected]>

ok - this makes sense to me -- thank you for the succint replies

I now have 5 imputed versions of each of my regression models

I would like to describe these models using standardized betas.

I plan to use MIANALYSIS to combine the 5 results.
The output from spss appears to write only the unstandardized betas
  (for input into MIANALYSIS).

Any ideas about how I can get the Standardized beta estimates?
 


Donald Rubin wrote:

>Yup, wrong answer, unless the statistic is linear in all the missing data, 
>i.e., this could only work in your case if the only varible with 
>missingness is y.  ANd even then, all the standard errors and tests are 
>wrong.  Not a very successful path to follow.
>
>
>On Fri, 25 Jul 2003, Steve Peck wrote:
>
>  
>
>>Assuming a set of 20 continuous variables,
>>are there specific reasons for *not* combining the
>> results of 5 MI data sets before doing regression analyses
>>  (e.g., by computing value estimates by averaging across
>> the 5 values per variable) instead of combing the parameter
>>  estimates that are generated from each of the 5 models run
>>  seperately)?
>>
>>thanks,
>>Steve
>>
>>
>>    
>>
>
>  
>

-- 
Stephen C. Peck
Senior Research Associate Social Science
University of Michigan
204 S. State St. # 1239
Ann Arbor, MI  48109-1290
(734) 647-3683; fax (734) 936-7370
[email protected]
http://www.rcgd.isr.umich.edu/garp/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20030728/3094dac1/attachment.htm
From newgardc <@t> ohsu.edu  Wed Jul 30 12:13:40 2003
From: newgardc <@t> ohsu.edu (Craig Newgard)
Date: Sun Jun 26 08:25:00 2005
Subject: IMPUTE: Re: Beginner Question about Nonconvergence of EM
        Algorithm
Message-ID: <[email protected]>

Paul,
I'm not sure that I follow your full description of the question, but I have a 
few suggestions.  For using MI when your hypothesis centers on testing an 
interaction term(s), Paul Allison (see below) offers a nice explanation of how 
to design your imputation model to maximize statistical efficiency of this term 
(i.e., parallel chains of MI, split on one of interaction terms).  This design 
will need to be adjusted if both terms in the interaction have missing values.  
If you are having a difficult time getting the SAS proc MI algorithm to 
converge, you may want to try another program, such as IVEware, that uses more 
flexible models based on each variable being imputed (can download beta version 
IVEware for free).

Craig

Allison PD. (2001) Missing Data. Sage University Papers Series on Quantitative 
Applications in the Social Sciences, 07-136. Thousand Oaks, CA: Sage.  
Raghunathan TE, Lepkowski , Van Hoewyk J, Solenberger PW. A multivariate 
technique for multiply imputing missing values using a sequence of regression 
models. Survey Methodology 2001;27:85-95.
Craig D. Newgard, MD, MPH
Assistant Professor
Department of Emergency Medicine
Department of Public Health & Preventative Medicine
Oregon Health & Science University
3181 Sam Jackson Park Road
Mail Code CR-114
Portland, OR 97201-3098
(503) 494-1668 (Office)
(503) 494-4640 (Fax)
[email protected]



>>> Paul Miller <[email protected]> 07/27/03 08:19AM >>>

Hi Everyone, 
I am new to multiple imputation, and would like to get your advice about a 
project that I am working on. I am interested in testing a model involving a 
latent interaction term using 2sls. My data are drawn from a 4-wave 13-year 
longitudinal study of married couples. The sample consists of 154 couples. Data 
for the early years of marriage were gathered at three annual intervals 
beginning when couples were newlyweds in 1980-1981. Follow-up data then were 
gathered on a fourth occasion in 1994-1995. 
Each phase of the study consisted of a primary face-to-face interview and a 
series of follow-up daily diary telephone interviews. Primary interviews were 
used to gather data on people's perceptions of their partner's traits and their 
feelings of marital satisfaction. The follow-up interviews were used to gather 
"quasi-observational" data on socioemotional behaviors in marriage, such as 
affection, negativity, and sexual expression. 
During the 13-year follow-up, data also were gathered on marital stability. In 
conjunction with the data on marital satisfaction, this made it possible to 
categorize couples into four marital outcome groups: happily married, unhappily 
married, early divorced, and later divorced. 
In my study, I would like to test whether people's tendency to idealize their 
partner in the early years of marriage is associated with marital outcomes 13 
years after couples were first married. In my analysis, I want to use 2sls to 
regress a manifest measure of the extent to which people perceive that their 
intimate partner is pleasant to be around on a latent measure of the partner's 
tendency to engage in pleasant behavior, a latent measure of the partner's 
tendency to engage in unpleasant behavior, and the interaction between the two. 
Then I want to save the residual in prediction as a measure of idealization for 
use in subsequent analyses. The latent measure of pleasant behavior and the 
latent measure of unpleasant behavior both have four indicators. The latent 
interaction has 10 indicators * a scaling indicator that is formed by 
multiplying the scaling indicators of the latent measure of pleasant behavior 
and the latent measure of unpleasant behavior, and 9 addi tional indicators 
that are formed by multiplying each of the nonscaling indicators of the latent 
measure of pleasant behavior with each of the nonscaling indicators of the 
latent measure of unpleasant behavior. 
A 2sls analysis of the model described above consists of two steps. First, the 
scaling indicators for each latent are regressed on the non-scaling indicators 
for that latent as well as the non-scaling indicators from the other latents in 
the model and predicted values are obtained. Then the criterion is regressed on 
the predicted values for each of the scaling indicators. The basic idea is that 
because the non-scaling indicators are correlated with the scaling indicators, 
but not with the disturbance term in their measurement equations, they can be 
used to purge the scaling indicators of measurement error, thereby yielding 
estimates of the underlying latents.
Anyway, I'm trying to use multiple imputation to deal with missing observations 
in a data set that contains the variables that I will need to test my model for 
husbands and wives at three points in time during the early years of marriage. 
This data set consists of slightly more than 100 variables, roughly half of 
which are interaction terms. Ideally, I would like to be able to preserve 
associations in these variables across both gender and time. So my sense is 
that I need to avoid dividing my imputation data set into a number of smaller 
data sets (e.g., male variables at time 1, female variables at time 1, etc.). 
Unfortunately though, when I try to use this larger data set in both SAS Proc 
Mi and Norm I have trouble getting the EM algorithm to converge. In SAS, this 
data set also creates an error that causes the program to terminate. SAS isn't 
too specific about the nature of the problem. It just says things like "invalid 
operation" and "generic error." So I'm hoping that people can give me some 
advice on what I should do next. 
The indicators for the latent predictors tend to be non-normal with a lot of 
positive skew. In addition, there are often one or two outliers in the high end 
of the distribution that are detached from the rest of the distribution. So, 
for example, if one looks at people's reports of the average number of times 
per day that their spouse expresses physical affection toward them (e.g., 
kissing, hugging, cuddling), most scores are in the lower end of the 
distribution but there are some extremely high scores as well. 
So far, I've tried removing the extreme scores from the data but this doesn't 
fix the problems with non-convergence and crashing that I get with the larger 
imputation data set. The next thing I'm planning to do is to transform the 
variables in the hope of attaining multivariate normality. My guess is that 
this will not be successful, but I'll give it a try. My sense though is that 
the failure to attain multivariate normality may not be at the heart of the 
problem. According to the documentation for SAS Proc Mi this typically is not a 
problem unless a large percentage of the data are missing. My rate of missing 
information never exceeds 24%. Mind you, the SAS documentation doesn't say what 
constitutes a large amount of missing data. So maybe this is a lot. 
Another possibility might be large discrepancies in variance among the 
variables. Some of the behaviors in my data occur much more frequently than do 
others. For example, positive behaviors such as expressing physical affection 
tend to occur more frequently than do negative behaviors such as showing anger 
or impatience by snapping, yelling, or raising one's voice. So I was thinking 
that it might be possible to artificially increase the variance of some 
variables or decrease the variance of others. Maybe this could be accomplished 
by multiplying or dividing (or adding or subtracting) a constant. But I don't 
know if this sort of thing is normally done, or if large discrepancies in the 
frequency at which different behaviors occur is even likely to be part of the 
problem.
At any rate, any advice that people could give me about what to do next would 
be greatly appreciated. Sorry for what is admittedly a long posting, but I felt 
that I would be most likely to get good advice if I took the time to give a 
detailed description of my project.
Thanks, 
Paul 


Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20030730/3546fa10/attachment.htm
From G.Raab <@t> napier.ac.uk  Wed Jul 30 18:22:13 2003
From: G.Raab <@t> napier.ac.uk (Raab, Gillian)
Date: Sun Jun 26 08:25:00 2005
Subject: IMPUTE: Re: Beginner Question about Nonconvergence of EM Algo
        rithm
Message-ID: <[email protected]>


 I agree with Craig's suggestion to use IVEware. It is much better than norm
etc for categorical data.

I assume that you were using the facilities in MI that pull back the imputed
normal values to categories. When I tried to use this a couple of years back
using SAS 8.1 (and also 8.2) there were some nasty bugs that would result in
errors in a few variables, but not always and sporadically. What happened
was that sometimes I would find a handful of non-missing data had been
replaced. I sent all kinds of messages to SAS about this but did not get
much joy from them, just saying that the procedure was unsupported, so I
gave up and fudged that problem in some other way. HAs anyone else on this
list had similar problems? And are they fixed now?

SInce then I have found IVEware much better. It will do everything that MI
does and more, though it is a bit harder to use.  I started to write some
notes as a bit of an idiots guide to it, but they are still incomplete. The
only problem I faced with it ws the need to ensure that all continuous
variables were approximately centred to avoid computational difficulties.

Good luck

Gillian Raab
Edinburgh

Reply via email to