IMPUTE: Re: Beginner Question about Nonconvergence of EMAlgorithm

Craig Newgard Wed, 30 Jul 2003 10:14:13 -0700

Paul,

I'm not sure that I follow your full description of the question, but I have a few suggestions. For using MI when your hypothesis centers on testing an interaction term(s), Paul Allison (see below) offers a nice explanation of how to design your imputation model to maximize statistical efficiency of this term (i.e., parallel chains of MI, split on one of interaction terms). This design will need to be adjusted if both terms in the interaction have missing values. If you are having a difficult time getting the SAS proc MI algorithm to converge, you may want to try another program, such as IVEware, that uses more flexible models based on each variable being imputed (can download beta version IVEware for free).

Craig

Allison PD. (2001) Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-136. Thousand Oaks, CA: Sage.
Raghunathan TE, Lepkowski , Van Hoewyk J, Solenberger PW. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology 2001;27:85-95.
Craig D. Newgard, MD, MPH
Assistant Professor
Department of Emergency Medicine
Department of Public Health & Preventative Medicine
Oregon Health & Science University
3181 Sam Jackson Park Road
Mail Code CR-114
Portland, OR 97201-3098
(503) 494-1668 (Office)
(503) 494-4640 (Fax)
[EMAIL PROTECTED]

>>> Paul Miller <[EMAIL PROTECTED]> 07/27/03 08:19AM >>>

Hi Everyone,

I am new to multiple imputation, and would like to get your advice about a project that I am working on. I am interested in testing a model involving a latent interaction term using 2sls. My data are drawn from a 4-wave 13-year longitudinal study of married couples. The sample consists of 154 couples. Data for the early years of marriage were gathered at three annual intervals beginning when couples were newlyweds in 1980-1981. Follow-up data then were gathered on a fourth occasion in 1994-1995.

Each phase of the study consisted of a primary face-to-face interview and a series of follow-up daily diary telephone interviews. Primary interviews were used to gather data on people’s perceptions of their partner’s traits and their feelings of marital satisfaction. The follow-up interviews were used to gather "quasi-observational" data on socioemotional behaviors in marriage, such as affection, negativity, and sexual _expression_.

During the 13-year follow-up, data also were gathered on marital stability. In conjunction with the data on marital satisfaction, this made it possible to categorize couples into four marital outcome groups: happily married, unhappily married, early divorced, and later divorced.

In my study, I would like to test whether people’s tendency to idealize their partner in the early years of marriage is associated with marital outcomes 13 years after couples were first married. In my analysis, I want to use 2sls to regress a manifest measure of the extent to which people perceive that their intimate partner is pleasant to be around on a latent measure of the partner’s tendency to engage in pleasant behavior, a latent measure of the partner’s tendency to engage in unpleasant behavior, and the interaction between the two. Then I want to save the residual in prediction as a measure of idealization for use in subsequent analyses. The latent measure of pleasant behavior and the latent measure of unpleasant behavior both have four indicators. The latent interaction has 10 indicators – a scaling indicator that is formed by multiplying the scaling indicators of the latent measure of pleasant behavior and the latent measure of unpleasant behavior, and 9 addi tional indicators that are formed by multiplying each of the nonscaling indicators of the latent measure of pleasant behavior with each of the nonscaling indicators of the latent measure of unpleasant behavior.

A 2sls analysis of the model described above consists of two steps. First, the scaling indicators for each latent are regressed on the non-scaling indicators for that latent as well as the non-scaling indicators from the other latents in the model and predicted values are obtained. Then the criterion is regressed on the predicted values for each of the scaling indicators. The basic idea is that because the non-scaling indicators are correlated with the scaling indicators, but not with the disturbance term in their measurement equations, they can be used to purge the scaling indicators of measurement error, thereby yielding estimates of the underlying latents.

Anyway, I’m trying to use multiple imputation to deal with missing observations in a data set that contains the variables that I will need to test my model for husbands and wives at three points in time during the early years of marriage. This data set consists of slightly more than 100 variables, roughly half of which are interaction terms. Ideally, I would like to be able to preserve associations in these variables across both gender and time. So my sense is that I need to avoid dividing my imputation data set into a number of smaller data sets (e.g., male variables at time 1, female variables at time 1, etc.).

Unfortunately though, when I try to use this larger data set in both SAS Proc Mi and Norm I have trouble getting the EM algorithm to converge. In SAS, this data set also creates an error that causes the program to terminate. SAS isn’t too specific about the nature of the problem. It just says things like "invalid operation" and "generic error." So I’m hoping that people can give me some advice on what I should do next.

The indicators for the latent predictors tend to be non-normal with a lot of positive skew. In addition, there are often one or two outliers in the high end of the distribution that are detached from the rest of the distribution. So, for example, if one looks at people’s reports of the average number of times per day that their spouse expresses physical affection toward them (e.g., kissing, hugging, cuddling), most scores are in the lower end of the distribution but there are some extremely high scores as well.

So far, I’ve tried removing the extreme scores from the data but this doesn’t fix the problems with non-convergence and crashing that I get with the larger imputation data set. The next thing I’m planning to do is to transform the variables in the hope of attaining multivariate normality. My guess is that this will not be successful, but I’ll give it a try. My sense though is that the failure to attain multivariate normality may not be at the heart of the problem. According to the documentation for SAS Proc Mi this typically is not a problem unless a large percentage of the data are missing. My rate of missing information never exceeds 24%. Mind you, the SAS documentation doesn’t say what constitutes a large amount of missing data. So maybe this is a lot.

Another possibility might be large discrepancies in variance among the variables. Some of the behaviors in my data occur much more frequently than do others. For example, positive behaviors such as expressing physical affection tend to occur more frequently than do negative behaviors such as showing anger or impatience by snapping, yelling, or raising one’s voice. So I was thinking that it might be possible to artificially increase the variance of some variables or decrease the variance of others. Maybe this could be accomplished by multiplying or dividing (or adding or subtracting) a constant. But I don’t know if this sort of thing is normally done, or if large discrepancies in the frequency at which different behaviors occur is even likely to be part of the problem.

At any rate, any advice that people could give me about what to do next would be greatly appreciated. Sorry for what is admittedly a long posting, but I felt that I would be most likely to get good advice if I took the time to give a detailed description of my project.

Thanks,

Paul

Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

IMPUTE: Re: Beginner Question about Nonconvergence of EMAlgorithm

Reply via email to