Hi folks, I'm back with another question about using multiple imputation (or FIML) to handle a sticky missing data problem. Apologies in advance for my overly long explanation.
One of my students is analyzing data from a large sample of men who used an online sex website (oriented to men who have sex with men). The study used a daily diary methodology, wherein each participant was asked to complete a daily record of their sexual experiences over a 30 day period. Our missing data problem concerns her main outcome variable, which is number of experiences of condomless receptive anal sex each day. The online survey was designed so that a person first would indicate how many instances of receptive anal sex they had that day. If the person responded "0," then he would be routed to an entirely different set of questions (questions having nothing to do with receptive anal sex). However, if the person responded with a number greater than 0, he would then be routed to a set of questions asking him to describe his first experience, his second experience (and so on). For each experience, a question asked whether the person's sex partner used a condom. In a number of cases, a person might indicate having had a certain number instances of receptive anal sex that day but drop out of the survey before reporting on all of the experiences. For example, a person might have said that he had 2 experiences of receptive anal sex on a day, but he only reported on 1 of those two experiences. It isn't clear to us how we should deal with these missing data. One option would be to impute at the item level (for the item "Did your partner use a condom during this experience?"). We then could compute the outcome variable and proceed as usual (although I have never used multiple imputation with multilevel data!). Another option would be to treat the outcome variable as missing but use as auxiliary variables all available data on individual sexual experiences. I guess another option would be to do this in the context of a covariance structure model (using FIML for missing data). In that case, we could define a formative factor representing the outcome variable (number of condomless receptive anal sex experiences), using all available data on experiences from that day. Or, alternatively, we could code the outcome variable as missing for people who did not report on all of their daily experiences, and then use all available data on individual experiences as auxiliary variables. I would be grateful for any thoughts about how to best handle this situation. In addition to the fact that we have missing data, the situation is complicated by the facts that (a) the data are multilevel, (b) the missing item responses are for a binary variable, and (c) the outcome is a count variable. Thanks in advance for your insights! Jon -- ***Please note change of email to [email protected]*** Jonathan Mohr Assistant Professor Department of Psychology Biology-Psychology Building University of Maryland College Park, MD 20742-4411 Office phone: 301-405-5907 Fax: 301-314-5966 Email: [email protected]
