Re: [R] indicator or deviation contrasts in log-linear modelling

2009-02-19 Thread Michael Friendly

Maja,

The need to interpret parameters in log-linear models (and therefore, 
the need to understand how the model is parameterized) often vanishes

if you visualize the fitted model or the residuals in a mosaic display.

e.g., ucb1 asserts Admit is jointly independent of Gender and Dept ---
fits very badly, but the residuals show the *nature* of the association
not accounted for.
ucb2 - Admit and Gender conditionally independent, given Dept --- fits 
badly overall, but only in one department.


 library(vcd)
 ucb1 - loglm(~Admit + Gender*Dept, data=UCBAdmissions)
 ucb1
Call:
loglm(formula = ~Admit + Gender * Dept, data = UCBAdmissions)

Statistics:
 X^2 df P( X^2)
Likelihood Ratio 877 110
Pearson  798 110
 plot(ucb1)
 ucb2 - loglm(~Admit*Dept + Gender*Dept, data=UCBAdmissions)
 ucb2
Call:
loglm(formula = ~Admit * Dept + Gender * Dept, data = UCBAdmissions)

Statistics:
 X^2 df P( X^2)
Likelihood Ratio  22  6   0.0014
Pearson   20  6   0.0028
 plot(ucb2)


maiya wrote:

I am fairly new to log-linear modelling, so as opposed to trying to fit
modells, I am still trying to figure out how it actually works - hence I am
looking at the interpretation of parameters. Now it seems most people skip
this part and go directly to measuring model fit, so I am finding very few
references to actual parameters, and am of course clear on the fact that
their choice is irelevant for the actual model fit. 


But here is my question: loglin uses deviation contrasts, so the
coefficients in each term add up to zero.
Another option are indicator contrasts, where a reference category is chosen
in each term and set to zero, while the others are relative to it. My
question is if there is a log-linear command equivalent to loglin that uses
this secong dummy coding style of constraints (I know e.g. spss genlog
does this). 


I hope this is not to basic a question!

And if anyone is up for answeing the wider question of why log-linear
parameters are not something to be looked at - which might just be my
impression of the literature - feel free to comment!

Thanks for your help!

Maja



--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] indicator or deviation contrasts in log-linear modelling

2009-02-19 Thread Charles C. Berry

On Wed, 18 Feb 2009, maiya wrote:



I realise that in the case of loglin the parameters are clacluated post
festum from the cell frequencies,
however other programmes that use Newton-Raphson as opposed to IPF work the
other way round, right?
In which case one would expect the output of parameters to be limited to the
particular contrast used. But since loglin uses IPF I would have thought the
choice of style of parameter to be output could be made...
Anyway, this is the line that interests me:


lm( as.vector( loglin(...,fit=TRUE)$fit ) ~  your favored contrasts  )


only I'm not profficient in R to figure out the last term :(
How would I go about this then if my prefered contrasti is setting the first
categories as reference cats?


See An Introduction to R Chapter 11

and try this:


 for ( i in ls('package:stats',pat='contr[.]')){
 cat( i, '\n' )
 print( get(i)(letters[1:5]) )
 options(contrasts=c(unordered=i,ordered='contr.poly'))
 print( coef( glm( Freq~ Dept*Gender,
as.data.frame(UCBAdmissions),family=poisson)) )
 }




I literaly just need the equivalent of

loglin(matrix(c(1,2,3,4), nrow=2), list(c(1,2)), param=TRUE)

which would give me parameters under indicator contrast. glm... well, I'd
have to work on it

Regarding the more general points

ad 2) I would have thought that direct inspection of cell frequencies is
precisely the wrong/misleading thing to do - the highest order coefficients
can be inspected directly in order to see the interaction without the
(lower) marginal effects, or alternatively the table can be standardized to
uniform margins for the same sort of inspection.


OK, to each her own.

But try this out yourself. What is the story here?

(Review ?UCBAdmissions, if you need to.)


options(contrasts=c(unordered='contr.sum',ordered='contr.poly'))
print( cbind(coef( glm( Freq~ Admit*Dept*Gender,

  as.data.frame(UCBAdmissions),family=poisson)) ))
 [,1]
(Intercept)   4.786575880
Admit1   -0.277614562
Dept1 0.067824911
Dept2-0.758615446
Dept3 0.560293364
Dept4 0.446131873
Dept5-0.001254892
Gender1   0.355262130
Admit1:Dept1  0.786694268
Admit1:Dept2  0.599494828
Admit1:Dept3 -0.021374963
Admit1:Dept4 -0.053867688
Admit1:Dept5 -0.250913079
Admit1:Gender1   -0.050744703
Dept1:Gender1 0.782600986
Dept2:Gender1 1.216370861
Dept3:Gender1-0.646880514
Dept4:Gender1-0.308737151
Dept5:Gender1-0.691810320
Admit1:Dept1:Gender1 -0.212274286
Admit1:Dept2:Gender1 -0.004260932
Admit1:Dept3:Gender1  0.081975109
Admit1:Dept4:Gender1  0.030247904
Admit1:Dept5:Gender1  0.100791458




OK, got the whole story? Could you explain it to someone who is not a 
statistician?


Now try it again. But with this display:


ftable(UCBAdmissions)

Dept   A   B   C   D   E   F
AdmitGender
Admitted Male512 353 120 138  53  22
 Female   89  17 202 131  94  24
Rejected Male313 207 205 279 138 351
 Female   19   8 391 244 299 317

round( ftable(prop.table(UCBAdmissions,2:3)) ,2)

DeptABCDEF
AdmitGender
Admitted Male0.62 0.63 0.37 0.33 0.28 0.06
 Female  0.82 0.68 0.34 0.35 0.24 0.07
Rejected Male0.38 0.37 0.63 0.67 0.72 0.94
 Female  0.18 0.32 0.66 0.65 0.76 0.93


You can pretty easily see that admission rates vary by department, that 
all departments but one have pretty equal admission rates by gender and 
that in that department the rate is a 20% higher for females. (And yes, 
a significance test confirms this).


Maybe not a statistified as talking about three-way interactions and 
coefficients of products of contrasts, but I'll bet a lot of scientists 
would find the tables more compelling.


HTH,

Chuck



ad 3) and yes, I figured as much! I can't see how lower order terms can be
interpreted at all if higher order interactions exist? I've seen it done,
e.g I've seen it claimed that in a standardized table the lower order terms
are all equal to zero, which is of course not true?

Thanks!
Maja



--
View this message in context: 
http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22093070.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901


[R] indicator or deviation contrasts in log-linear modelling

2009-02-18 Thread maiya

I am fairly new to log-linear modelling, so as opposed to trying to fit
modells, I am still trying to figure out how it actually works - hence I am
looking at the interpretation of parameters. Now it seems most people skip
this part and go directly to measuring model fit, so I am finding very few
references to actual parameters, and am of course clear on the fact that
their choice is irelevant for the actual model fit. 

But here is my question: loglin uses deviation contrasts, so the
coefficients in each term add up to zero.
Another option are indicator contrasts, where a reference category is chosen
in each term and set to zero, while the others are relative to it. My
question is if there is a log-linear command equivalent to loglin that uses
this secong dummy coding style of constraints (I know e.g. spss genlog
does this). 

I hope this is not to basic a question!

And if anyone is up for answeing the wider question of why log-linear
parameters are not something to be looked at - which might just be my
impression of the literature - feel free to comment!

Thanks for your help!

Maja
-- 
View this message in context: 
http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22090104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] indicator or deviation contrasts in log-linear modelling

2009-02-18 Thread Charles C. Berry

On Wed, 18 Feb 2009, maiya wrote:



I am fairly new to log-linear modelling, so as opposed to trying to fit
modells, I am still trying to figure out how it actually works - hence I am
looking at the interpretation of parameters. Now it seems most people skip
this part and go directly to measuring model fit, so I am finding very few
references to actual parameters, and am of course clear on the fact that
their choice is irelevant for the actual model fit.

But here is my question: loglin uses deviation contrasts,


Depends on what you mean by 'uses'.


From ?loglin


QUOTE:

Details

The Iterative Proportional Fitting algorithm as presented in Haberman 
(1972) is used for fitting the model. At most iter iterations are 
performed, convergence is taken to occur when the maximum deviation 
between observed and fitted margins is less than eps. All internal 
computations are done in double precision; there is no limit on the number 
of factors (the dimension of the table) in the model.


END QUOTE

There are no explicit contrasts in IPF. The $param component returned when 
'param=TRUE' is used is derived from the estimated cell frequencies. You 
can transform these to other basis vectors. If there are no structural 
zeros,


lm( as.vector( loglin(...,fit=TRUE)$fit ) ~  your favored contrasts  )

will give you estimates under your favored scheme.

Then too there is the surrogate Poisson approach, which will do this too.

 so the

coefficients in each term add up to zero.
Another option are indicator contrasts, where a reference category is chosen
in each term and set to zero, while the others are relative to it. My
question is if there is a log-linear command equivalent to loglin that uses
this secong dummy coding style of constraints (I know e.g. spss genlog
does this).


Yep, glm(). See McCullagh P. and Nelder, J. A. (1989) Generalized Linear 
Models. London: Chapman and Hall. for details on surrogate Poisson 
modelling.




I hope this is not to basic a question!

And if anyone is up for answeing the wider question of why log-linear
parameters are not something to be looked at - which might just be my
impression of the literature - feel free to comment!



I can think of three:

1) IPF doesn't need the parameters to do its work and do tests based on
   loglinear models. The canonical reference is Bishop, Fienberg, and
   Holland's Discrete Multivariate Analysis, 1975.

2) In many applications, direct inspection of the cell frequencies or
   their estimates is quite natural.

3) Often there are higher order effects (a four way table with 3 way
   interactions, say) , so the lower order parameter values are not easily
   interpreted anyway.

HTH,

Chuck




Thanks for your help!

Maja
--
View this message in context: 
http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22090104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] indicator or deviation contrasts in log-linear modelling

2009-02-18 Thread maiya

I realise that in the case of loglin the parameters are clacluated post
festum from the cell frequencies,
however other programmes that use Newton-Raphson as opposed to IPF work the
other way round, right?
In which case one would expect the output of parameters to be limited to the
particular contrast used. But since loglin uses IPF I would have thought the
choice of style of parameter to be output could be made...
Anyway, this is the line that interests me:

   lm( as.vector( loglin(...,fit=TRUE)$fit ) ~  your favored contrasts  )

only I'm not profficient in R to figure out the last term :(
How would I go about this then if my prefered contrasti is setting the first
categories as reference cats?

I literaly just need the equivalent of

loglin(matrix(c(1,2,3,4), nrow=2), list(c(1,2)), param=TRUE)

which would give me parameters under indicator contrast. glm... well, I'd
have to work on it

Regarding the more general points 

ad 2) I would have thought that direct inspection of cell frequencies is
precisely the wrong/misleading thing to do - the highest order coefficients
can be inspected directly in order to see the interaction without the
(lower) marginal effects, or alternatively the table can be standardized to
uniform margins for the same sort of inspection.

ad 3) and yes, I figured as much! I can't see how lower order terms can be
interpreted at all if higher order interactions exist? I've seen it done,
e.g I've seen it claimed that in a standardized table the lower order terms
are all equal to zero, which is of course not true?

Thanks!
Maja



-- 
View this message in context: 
http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22093070.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.