Jonathan, I think the ad hoc solutions, especially partial data go to missing. could easily give you some bias.
Perhaps if you stay at the "atomic" level of intercourse occasions this might work better, so that you're not deriving a variable, then imputed, from incomplete lower-level data. Here's my thinking-aloud: Your observations are occasions (a variable number of them, often zero) nested within days nested within participants. The outcome is the binary condom/no-condom for the occasion. Depending on the software you're using, this would avoid the need for imputation, and avoid thorny problems such as imputing values for occasions that never existed. There are just occasions for which the response is missing, and can be treated as an MAR problem. Variable numbers of lower-level occasions do not cause a problem. You'd likely want to include whatever data you have on those incomplete occasions -- especially its sequence number in the day -- to bolster the MAR assumption. If that works, then occasions become exchangeable. A binary at the level of day indicating 0 / >0 would make this similar to a semicontinuous data model, but that might not be necessary, since it's implied. Just spitballing. I think Mplus could do this with the WLSMV estimator, but that's a hunch. Actually, GLIMMIX in SAS might be able to do it. Hope this helps, Pat On Wed, May 7, 2014 at 3:41 PM, Jonathan Mohr <[email protected]> wrote: > Hi folks, > I'm back with another question about using multiple imputation (or FIML) > to handle a sticky missing data problem. Apologies in advance for my overly > long explanation. > > One of my students is analyzing data from a large sample of men who used > an online sex website (oriented to men who have sex with men). The study > used a daily diary methodology, wherein each participant was asked to > complete a daily record of their sexual experiences over a 30 day period. > Our missing data problem concerns her main outcome variable, which is > number of experiences of condomless receptive anal sex each day. > > The online survey was designed so that a person first would indicate how > many instances of receptive anal sex they had that day. If the person > responded "0," then he would be routed to an entirely different set of > questions (questions having nothing to do with receptive anal sex). > However, if the person responded with a number greater than 0, he would > then be routed to a set of questions asking him to describe his first > experience, his second experience (and so on). For each experience, a > question asked whether the person's sex partner used a condom. > > In a number of cases, a person might indicate having had a certain number > instances of receptive anal sex that day but drop out of the survey before > reporting on all of the experiences. For example, a person might have said > that he had 2 experiences of receptive anal sex on a day, but he only > reported on 1 of those two experiences. > > It isn't clear to us how we should deal with these missing data. One > option would be to impute at the item level (for the item "Did your partner > use a condom during this experience?"). We then could compute the outcome > variable and proceed as usual (although I have never used multiple > imputation with multilevel data!). Another option would be to treat the > outcome variable as missing but use as auxiliary variables all available > data on individual sexual experiences. > > I guess another option would be to do this in the context of a covariance > structure model (using FIML for missing data). In that case, we could > define a formative factor representing the outcome variable (number of > condomless receptive anal sex experiences), using all available data on > experiences from that day. Or, alternatively, we could code the outcome > variable as missing for people who did not report on all of their daily > experiences, and then use all available data on individual experiences as > auxiliary variables. > > I would be grateful for any thoughts about how to best handle this > situation. In addition to the fact that we have missing data, the situation > is complicated by the facts that (a) the data are multilevel, (b) the > missing item responses are for a binary variable, and (c) the outcome is a > count variable. > > Thanks in advance for your insights! > Jon > > -- > ***Please note change of email to [email protected]*** > > Jonathan Mohr > Assistant Professor > Department of Psychology > Biology-Psychology Building > University of Maryland > College Park, MD 20742-4411 > > Office phone: 301-405-5907 > Fax: 301-314-5966 > Email: [email protected] > -- Patrick S. Malone, Ph.D., Associate Professor Department of Psychology University of South Carolina Director, Researching Adolescent Problem Behaviors Laboratory Yahoo Messenger: patricksmalone AOL Instant Messenger: pat2048
