> On Dec 29, 2015, at 5:33 AM, Frank van Berkum <frankieboy...@hotmail.com> 
> wrote:
> 
> Hi all,
> My problem is the following. 
> Suppose I have a dataset with observations Y and explanatory variables X1, 
> ..., Xn, and suppose one of these explanatory variables is geographical area 
> (of which there are ten, j=1,...,10).  For some observations I know the area, 
> but for others it is unknown and therefore record as NA.
> I want to estimate a model of the form Y[i] ~ Poisson( lambda[i] ) with 
> log(lambda[i]) = constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * 
> beta[j]
> In words: we estimate a constant for all observations and a factor for each 
> area. If it is unknown what the area is, we only include the constant. 
> When estimating this model using glm(), the records with is.na(area[i]) are 
> 'deleted' from the dataset, and this I do not want. I had hoped that the 
> model as described above could be estimated using the function I() (interpret 
> as), but so far my attempts have not succeeded. 
> Any help on how to approach this is kindly appreciated.
> Kind regards,
> Frank van Berkum                                        
>       [[alternative HTML version deleted]]

I don't understand why you don't just recode the NA's to "unknown" and redo the 
model. This code is untested, but I think demonstrates the two steps needed: 
add a level and then recode the NA values assuming this factor is named `X6`:

dat$X6 <- factor(dat$X6, levels=c( levels(dat$X6), "unknown") )
dat$X6[ is.na(dat$X6) ] <- "unknown"


(I think this might be is a bit quicker than Rolf's approach.)


In R the default contrasts are "treatment" (which is different than the 
contrasts you describe) and so each area will be referenced to the first area 
(and the first level of all other factors) in the lexical ordering of area 
names. This ordering and the contrast type can be changed. The are many 
postings on rhelp over the years demonstrating how to do this.

Study these and the basics of R before reposting and if you do so, then post in 
plain text and include a small example constructed in R:

?factor
?C
?constrasts
?contr.sum

-- 

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to