Hi folks,
I'm back with another question about using multiple imputation (or FIML) to
handle a sticky missing data problem. Apologies in advance for my overly
long explanation.

One of my students is analyzing data from a large sample of men who used an
online sex website (oriented to men who have sex with men). The study used
a daily diary methodology, wherein each participant was asked to complete a
daily record of their sexual experiences over a 30 day period. Our missing
data problem concerns her main outcome variable, which is number of
experiences of condomless receptive anal sex each day.

The online survey was designed so that a person first would indicate how
many instances of receptive anal sex they had that day. If the person
responded "0," then he would be routed to an entirely different set of
questions (questions having nothing to do with receptive anal sex).
However, if the person responded with a number greater than 0, he would
then be routed to a set of questions asking him to describe his first
experience, his second experience (and so on). For each experience, a
question asked whether the person's sex partner used a condom.

In a number of cases, a person might indicate having had a certain number
instances of receptive anal sex that day but drop out of the survey before
reporting on all of the experiences. For example, a person might have said
that he had 2 experiences of receptive anal sex on a day, but he only
reported on 1 of those two experiences.

It isn't clear to us how we should deal with these missing data. One option
would be to impute at the item level (for the item "Did your partner use a
condom during this experience?"). We then could compute the outcome
variable and proceed as usual (although I have never used multiple
imputation with multilevel data!). Another option would be to treat the
outcome variable as missing but use as auxiliary variables all available
data on individual sexual experiences.

I guess another option would be to do this in the context of a covariance
structure model (using FIML for missing data). In that case, we could
define a formative factor representing the outcome variable (number of
condomless receptive anal sex experiences), using all available data on
experiences from that day. Or, alternatively, we could code the outcome
variable as missing for people who did not report on all of their daily
experiences, and then use all available data on individual experiences as
auxiliary variables.

I would be grateful for any thoughts about how to best handle this
situation. In addition to the fact that we have missing data, the situation
is complicated by the facts that (a) the data are multilevel, (b) the
missing item responses are for a binary variable, and (c) the outcome is a
count variable.

Thanks in advance for your insights!
Jon

-- 
***Please note change of email to [email protected]***

Jonathan Mohr
Assistant Professor
Department of Psychology
Biology-Psychology Building
University of Maryland
College Park, MD 20742-4411

Office phone: 301-405-5907
Fax: 301-314-5966
Email: [email protected]

Reply via email to