I want to apologize for the more or less cross post. My question is not 
about imputation, but I hope you do not feel too much offended, as it is 
a question about missing data. As I do not know any mailing list 
generally about missing data, I think the members of this list have the 
nearest knowledge on this area.

I determined joint probabilities for X1 and X2. Both dichotomous 
variables have values 1 or 2. Then I suppose that X1 is always observed, 
but X2 may be missing. The probability that X2 is missing (M2=1) is 
bigger for X2=1 than X2=2, in the same way for X1=1 and X1=2. There are 
3 independent parameters for the probabilities of the joint distribution 
of (X1,X2) and 2 independent parameters for the probabilities of the 
conditional distribution of M2 given (X1,X2) - actually, it is the 
conditional probability of M2 given X2, because it is the same for both 
levels of X1. As I observe 4 frequencies for the joint categorization in 
(X1,X2) and more 2 frequencies for the categorization just in X1 (when 
X2 is missing), if I suppose a multinomial distribution for this data 
with the sum of this 6 frequencies fixed, I have 5 degrees of freedom.

I think it is the simpler nonignorable response mechanism MNAR for a 2x2 
partial categorized data with parameters that can be estimable by 
maximum likelihood. I am trying to gain more insight about this 
structures, so I simulated data from a multinomial joint distribution of 
(X1,X2,M2) and then tried to fit the same structure by ML.

The marginal probabilities of X1 and X2, that I determined, are not 
homogeneous. One of my interests is to compare the result of a test of 
marginal homogeneity when I use the correct MNAR structured versus a MAR 
structure or when discarding the missing data (complete cases, CC). I 
know that this two last approaches lead to biases (for the probabilities 
that I determined, this two last approaches should have almost a 
marginal homogeneity), but as the variances of the estimated parameters 
for the joint probabilities of (X1,X2) get so much inflated, when 
comparing the MNAR strutucture to MAR and CC, usually, it is difficult 
to have more conclusions of marginal heterogeneity using a MNAR than 
MAR/CC approaches if the sample sizes are not bigger than n=500. I think 
it is a little bit big, because there are not problems of sample zero 
even with sample size of n=100. I did 10,000 simulations of the fixed 
multinomial parameters for each sample size and then estimated the power 
of the likelihood ratio test of marginal homogeneity.

I was trying to gain insight for having an idea of the sample size that 
the adoption of a MNAR structure may actually make the difference. But 
the problem (sorry, but I had to introduce all this information first) 
is with this estimation of the power of the test. This *saturated* MNAR 
(0 degrees of freedom) has a likelihood ratio goodness of fit test 
bigger than 0.2 for 27% (14%) of the simulated data with sample sizes of 
n=50/100 (n=5,000), what is unacceptable. I noticed that almost all of 
the time that it happens, one of the estimated parameters of the 
conditional distribution of M2 given X2 is on the bound of the parameter 
space, but there are some exceptions.

The Newton Raphson, Fisher scoring and Louis Turbo EM algorithms are not 
being used because it lead to estimated parameters outside the parameter 
space for a lot of simulated data. I am using a function nlmP of the 
library geoR of the software R, that is a Newton minimizer similar to 
the function nlm that comes with R base but that permit constraints for 
the parameters. This function usually lead to estimated parameters near 
to the single EM algorithm, but converges much more faster than EM. But 
I choosed to use the estimated parameters of the nlmP as a initial 
estimate to EM and then use a convergence criterion of the minimum 
diference of sucessive estimates of 1e-8.

I repeated this exercise with a diversity of values for the 2 
conditional probabilities of M2 given X2: (i) varying from the middle to 
the bound of the parameter space and (ii) with bigger and smaller 
differences between the probabilities for M2=1 given X2=1 and for M2=1 
given X2=2. The amount of "anomalies" observed are almost the same.

Fay (1986) and Molenberghs et al. (1999) notice the same problems of 
estimated parameters near the bound of the parameter space and MNAR 
saturated structures with LR goodness of fit greater than zero, but 
there are not elucidating discussion about the causes of it.

Fay, R.E. (1986). Causal models for patterns of nonresponse. Journal of 
the American Statistical Association 81, 354-365.

Molenberghs, G., Goetghebeur, E.J.T., Lipsitz, S.R. and Kenward, M.G. 
(1999). Nonrandom missingness in categorical data: strengths and 
limitations. The American Statistician 53, 110-118.

I believe (or used to) that structures with this problems should not be 
considered further on the analysis, but as I am simulating from this 
structure I can't understand what is the problem.

I would appreciate any help, insights, other references or discussion 
about problems with the estimation of MNAR by maximum likelihood.

Maybe, if I get interesting results, I will use it as a part of my 
master degree dissertation.

Thanks in advance for the patience with this big reading and pardon me 
for the grammar mistakes (it has been a long time that I do not practice 
writing in English), but I hope it has not prejudiced the understanding 
of the problem.

-- 
Frederico Zanqueta Poleto
[email protected]
--
"An approximate answer to the right problem is worth a good deal more than an 
exact answer to an approximate problem." J. W. Tukey


Reply via email to