Jan,

Thierry is correct in saying that you are misusing glm(), but there is also a 
numerical problem.

You are misusing glm() because your model specification claims to have 
Binomial(n,p) observations with w in the vicinity of 100, where there is a 
single common p but the observed binomial proportion is either 1 or 0, never 
anything in between.  These data are a very poor fit to a binomial model.

The correct specification if you have what you call replicate weights and I call 
frequency weights is to produce a single data record for each covariate pattern that has 
both the 1 and 0 observations. This can either be two columns for successes and failures, 
or one column of proportions and one column of weights.  As your quote from MASS says 
"weights are used to give the number of trials when the response is the proportion 
of successes." In your data the response is *not* the proportion of successes.


However, the MLE should still be equal to the weighted mean even with this 
misuse.  The reason it is not is because of the starting values.  R has to find 
some starting values for the iterative maximization of the likelihood, and for 
binomial data with y successes out of n it uses  starting values for the fitted 
means of  (y+0.5)/(n+1).  Starting the iteration at the data in this way 
usually makes the Fisher scoring algorithm very reliable -- it is correctly 
scaled to the data, in some sense.   Unfortunately, if you separate out the 
successes and failures, you have some points starting with values very close to 
0.  When I used your code the starting value for the point with the largest 
weight was 0.5/199.   At iteration 2, the estimated mean ends up very small for 
all observations, and then the iteration diverges.  However, if you provide a 
starting value then the fitting works, even if you start the iteration at, say 
beta=1, corresponding to a fitted mean of over 70%.

So, the result is wrong in the sense that it is not the mle, because of a 
failure of convergence, which happens because specifying the weights the way 
you did rather than the documented way leads to bad default starting values for 
the iteration.  You need either to specify the data as recommended or supply 
starting values.

    =thomas


On Fri, 16 Apr 2010, Jan van der Laan wrote:

I have some questions about the use of weights in binomial glm as I am
not getting the results I would expect. In my case the weights I have
can be seen as 'replicate weights'; one respondent i in my dataset
corresponds to w[i] persons in the population. From the documentation
of the glm method, I understand that the weights can indeed be used
for this: "For a binomial GLM prior weights are used to give the
number of trials when the response is the proportion of successes."
From "Modern applied statistics with S-Plus 3rd ed." I understand the
same.

However, I am getting some strange results. I generated an example:

Generate some data which is simular to my dataset
Z <- rbinom(1000, 1, 0.1)
W <- round(rnorm(1000, 100, 40))
W[W < 1] <- 1

Probability of success can either be estimated using:
sum(Z*W)/sum(W)
[1] 0.09642109

Or using glm:
model <- glm(Z ~ 1, weights=W, family=binomial())
Warning message:
In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
etastart,  :
 fitted probabilities numerically 0 or 1 occurred
predict(model, type="response")[1]
          1
2.220446e-16

These two results are obviously not the same. The strange thing is
that when I scale the weights, such that the total equals one, the
probability is correctly estimated:

model <- glm(Z ~ 1, weights=W/sum(W), family=binomial())
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
predict(model, type="response")[1]
        1
0.09642109


However scaling of the weights should, as far as I am aware, not have
an effect on the estimated parameters. I also tried some other
scalings. And, for example scaling the weights by 20 also gives me the
correct result.

model <- glm(Z ~ 1, weights=W/20, family=binomial())
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
predict(model, type="response")[1]
        1
0.09642109


Am I misinterpreting the weights? Could this be a numerical problem?

Regards,

Jan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to