Hey Rod, Thanks for your response (and to Steven, also). I have generated quite a few hits for the topic on the multi-level list serve.
Even though my variable clearly appears to be mixed, I am reluctant to analyze it as so because of concerns about causal interpretations. Seems to me the two-stage models run into more problems with causal interpretation than do linear models. I have pasted below the note that I sent out on the other list serve. Apologies to those who subscribe to both. --Dave Allow me though to add some information. I am analyzing a randomized trial. Because of this, I don't want to use any sort of mixture model even though that seems to be the obvious best choice for a model. The problem with it as I see it is that it is hard to give a causal interpretation to the conditional model. Perhaps it can be done with principal strata (in the terminology of Frangakis and Rubin), but I didn't want to have to get involved with that sort of work. So I was really interested in robustness studies for linear models. How well do they work when a zero-inflated model seems to be the more appropriate model but cannot be used because it cannot be given a causal interpretation without additional strong assumptions? To see the problem with a zero-inflated model, consider a randomized trial of alternative smoking cessation interventions with an outcome of daily cigarette consumption after some interval of time. Suppose that never quitters under treatment smoke 25 a day, while never quitters under control smoke 40 a day, and compliers under control smoke 15 a day. The true effect of treatment on never quitters is to reduce consumption by 15 a day, but depending on the relative frequency of never quitters and compliers, the average consumption on the control side among those who did not quit might be lower than the average consumption on the treatment side. A simple solution is to ignore the extra information about consumption among persistent smokers and just analyze the binary outcome of quitting or not quitting. That seems to be the common approach. But I thought that the reading intervention I am studying might both lift nonreaders to be readers and help current readers to become better readers. The logit model and linear model both have a single location parameter. Maybe I need to use a multi-level nonparametric test. Of course, then I lose a lot of power. -----Original Message----- From: Rod Little [mailto:[email protected]] Sent: Monday, September 07, 2009 11:53 AM To: David Judkins Cc: [email protected] Subject: Re: [Impute] RE: Impute Digest, Vol 47, Issue 1 David, the sequential regression program IVEware allows for "mixed" variable types like the one you describe. I think it multiply imputes using a two stage model, for presence/absence and then amount given presence. Rod On Mon, 7 Sep 2009, Gregorich, Steven wrote: > Hi, David. > > It sounds like you are talking about a zero-inflated model, e.g., a > zero-inflated Poisson (ZIP) or zero-inflated negative > binomial (ZINB). You could also fit a zero-inflated normal model (ZIN; Joe > Schafer and a colleague wrote a paper > about that model in the late 1990's; they called it a 'two-part model'). I > never studied the ZIN model closely, but > presumably the normal part of the model would allow negative predicted > values, which I would not be happy with > in your application. > > You can fit a 2-level multilevel ZIP, ZIN, ZINB, etc in PROC NLMIXED and you > can probably find related SAS > code in the SAS-L archives; especially posts by Dale McLerran (sp?). > > HTH > > Steve > ________________________________________ > > Message: 1 > Date: Fri, 4 Sep 2009 14:25:33 -0400 > From: David Judkins <[email protected]> > Subject: [Impute] Robustness of Multi-Level Modeling Software > To: "[email protected]" > <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > This is not an imputation question, but I don't know of a list serve for > complex modeling questions. Maybe one of you will be able to help. > > Consider a mixed binary-normal distribution that results in a large point > mass on the edge of an otherwise more-or-less normal distribution. An > example is number of alcoholic drinks per day. Cigarettes per day is another > example. Or the number of questions reading questions answered correctly on a > sample that contains a large number of children who can't read at all. The > child reading example is my real concern because the children come grouped by > school. > > Anyone know of robustness studies of MLwin, HLM, Mixed, MPLUS, et cetera to > this radical departure from normality? I have heard it asserted that > school-level departures from normality are more of a concern than > student-level departures, but is this too much of a departure? > > > David Judkins > Senior Statistician > Westat > 1650 Research Boulevard > Rockville, MD 20850 > (301) 315-5970 > [email protected] > _______________________________________________ > Impute mailing list > [email protected] > http://lists.utsouthwestern.edu/mailman/listinfo/impute > > > ___________________________________________________________________________________ Roderick Little Professor and Chair, Department of Biostatistics U-M School of Public Health Tel (734) 936 1003 M4208 SPH II Fax (734) 763 2215 1420 Washington Hgts email [email protected] Ann Arbor, MI 48109-2029 http://www.sph.umich.edu/~rlittle/
