Re: [R] indicator or deviation contrasts in log-linear modelling
Maja, The need to interpret parameters in log-linear models (and therefore, the need to understand how the model is parameterized) often vanishes if you visualize the fitted model or the residuals in a mosaic display. e.g., ucb1 asserts Admit is jointly independent of Gender and Dept --- fits very badly, but the residuals show the *nature* of the association not accounted for. ucb2 - Admit and Gender conditionally independent, given Dept --- fits badly overall, but only in one department. library(vcd) ucb1 - loglm(~Admit + Gender*Dept, data=UCBAdmissions) ucb1 Call: loglm(formula = ~Admit + Gender * Dept, data = UCBAdmissions) Statistics: X^2 df P( X^2) Likelihood Ratio 877 110 Pearson 798 110 plot(ucb1) ucb2 - loglm(~Admit*Dept + Gender*Dept, data=UCBAdmissions) ucb2 Call: loglm(formula = ~Admit * Dept + Gender * Dept, data = UCBAdmissions) Statistics: X^2 df P( X^2) Likelihood Ratio 22 6 0.0014 Pearson 20 6 0.0028 plot(ucb2) maiya wrote: I am fairly new to log-linear modelling, so as opposed to trying to fit modells, I am still trying to figure out how it actually works - hence I am looking at the interpretation of parameters. Now it seems most people skip this part and go directly to measuring model fit, so I am finding very few references to actual parameters, and am of course clear on the fact that their choice is irelevant for the actual model fit. But here is my question: loglin uses deviation contrasts, so the coefficients in each term add up to zero. Another option are indicator contrasts, where a reference category is chosen in each term and set to zero, while the others are relative to it. My question is if there is a log-linear command equivalent to loglin that uses this secong dummy coding style of constraints (I know e.g. spss genlog does this). I hope this is not to basic a question! And if anyone is up for answeing the wider question of why log-linear parameters are not something to be looked at - which might just be my impression of the literature - feel free to comment! Thanks for your help! Maja -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indicator or deviation contrasts in log-linear modelling
On Wed, 18 Feb 2009, maiya wrote: I realise that in the case of loglin the parameters are clacluated post festum from the cell frequencies, however other programmes that use Newton-Raphson as opposed to IPF work the other way round, right? In which case one would expect the output of parameters to be limited to the particular contrast used. But since loglin uses IPF I would have thought the choice of style of parameter to be output could be made... Anyway, this is the line that interests me: lm( as.vector( loglin(...,fit=TRUE)$fit ) ~ your favored contrasts ) only I'm not profficient in R to figure out the last term :( How would I go about this then if my prefered contrasti is setting the first categories as reference cats? See An Introduction to R Chapter 11 and try this: for ( i in ls('package:stats',pat='contr[.]')){ cat( i, '\n' ) print( get(i)(letters[1:5]) ) options(contrasts=c(unordered=i,ordered='contr.poly')) print( coef( glm( Freq~ Dept*Gender, as.data.frame(UCBAdmissions),family=poisson)) ) } I literaly just need the equivalent of loglin(matrix(c(1,2,3,4), nrow=2), list(c(1,2)), param=TRUE) which would give me parameters under indicator contrast. glm... well, I'd have to work on it Regarding the more general points ad 2) I would have thought that direct inspection of cell frequencies is precisely the wrong/misleading thing to do - the highest order coefficients can be inspected directly in order to see the interaction without the (lower) marginal effects, or alternatively the table can be standardized to uniform margins for the same sort of inspection. OK, to each her own. But try this out yourself. What is the story here? (Review ?UCBAdmissions, if you need to.) options(contrasts=c(unordered='contr.sum',ordered='contr.poly')) print( cbind(coef( glm( Freq~ Admit*Dept*Gender, as.data.frame(UCBAdmissions),family=poisson)) )) [,1] (Intercept) 4.786575880 Admit1 -0.277614562 Dept1 0.067824911 Dept2-0.758615446 Dept3 0.560293364 Dept4 0.446131873 Dept5-0.001254892 Gender1 0.355262130 Admit1:Dept1 0.786694268 Admit1:Dept2 0.599494828 Admit1:Dept3 -0.021374963 Admit1:Dept4 -0.053867688 Admit1:Dept5 -0.250913079 Admit1:Gender1 -0.050744703 Dept1:Gender1 0.782600986 Dept2:Gender1 1.216370861 Dept3:Gender1-0.646880514 Dept4:Gender1-0.308737151 Dept5:Gender1-0.691810320 Admit1:Dept1:Gender1 -0.212274286 Admit1:Dept2:Gender1 -0.004260932 Admit1:Dept3:Gender1 0.081975109 Admit1:Dept4:Gender1 0.030247904 Admit1:Dept5:Gender1 0.100791458 OK, got the whole story? Could you explain it to someone who is not a statistician? Now try it again. But with this display: ftable(UCBAdmissions) Dept A B C D E F AdmitGender Admitted Male512 353 120 138 53 22 Female 89 17 202 131 94 24 Rejected Male313 207 205 279 138 351 Female 19 8 391 244 299 317 round( ftable(prop.table(UCBAdmissions,2:3)) ,2) DeptABCDEF AdmitGender Admitted Male0.62 0.63 0.37 0.33 0.28 0.06 Female 0.82 0.68 0.34 0.35 0.24 0.07 Rejected Male0.38 0.37 0.63 0.67 0.72 0.94 Female 0.18 0.32 0.66 0.65 0.76 0.93 You can pretty easily see that admission rates vary by department, that all departments but one have pretty equal admission rates by gender and that in that department the rate is a 20% higher for females. (And yes, a significance test confirms this). Maybe not a statistified as talking about three-way interactions and coefficients of products of contrasts, but I'll bet a lot of scientists would find the tables more compelling. HTH, Chuck ad 3) and yes, I figured as much! I can't see how lower order terms can be interpreted at all if higher order interactions exist? I've seen it done, e.g I've seen it claimed that in a standardized table the lower order terms are all equal to zero, which is of course not true? Thanks! Maja -- View this message in context: http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22093070.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
[R] indicator or deviation contrasts in log-linear modelling
I am fairly new to log-linear modelling, so as opposed to trying to fit modells, I am still trying to figure out how it actually works - hence I am looking at the interpretation of parameters. Now it seems most people skip this part and go directly to measuring model fit, so I am finding very few references to actual parameters, and am of course clear on the fact that their choice is irelevant for the actual model fit. But here is my question: loglin uses deviation contrasts, so the coefficients in each term add up to zero. Another option are indicator contrasts, where a reference category is chosen in each term and set to zero, while the others are relative to it. My question is if there is a log-linear command equivalent to loglin that uses this secong dummy coding style of constraints (I know e.g. spss genlog does this). I hope this is not to basic a question! And if anyone is up for answeing the wider question of why log-linear parameters are not something to be looked at - which might just be my impression of the literature - feel free to comment! Thanks for your help! Maja -- View this message in context: http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22090104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indicator or deviation contrasts in log-linear modelling
On Wed, 18 Feb 2009, maiya wrote: I am fairly new to log-linear modelling, so as opposed to trying to fit modells, I am still trying to figure out how it actually works - hence I am looking at the interpretation of parameters. Now it seems most people skip this part and go directly to measuring model fit, so I am finding very few references to actual parameters, and am of course clear on the fact that their choice is irelevant for the actual model fit. But here is my question: loglin uses deviation contrasts, Depends on what you mean by 'uses'. From ?loglin QUOTE: Details The Iterative Proportional Fitting algorithm as presented in Haberman (1972) is used for fitting the model. At most iter iterations are performed, convergence is taken to occur when the maximum deviation between observed and fitted margins is less than eps. All internal computations are done in double precision; there is no limit on the number of factors (the dimension of the table) in the model. END QUOTE There are no explicit contrasts in IPF. The $param component returned when 'param=TRUE' is used is derived from the estimated cell frequencies. You can transform these to other basis vectors. If there are no structural zeros, lm( as.vector( loglin(...,fit=TRUE)$fit ) ~ your favored contrasts ) will give you estimates under your favored scheme. Then too there is the surrogate Poisson approach, which will do this too. so the coefficients in each term add up to zero. Another option are indicator contrasts, where a reference category is chosen in each term and set to zero, while the others are relative to it. My question is if there is a log-linear command equivalent to loglin that uses this secong dummy coding style of constraints (I know e.g. spss genlog does this). Yep, glm(). See McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall. for details on surrogate Poisson modelling. I hope this is not to basic a question! And if anyone is up for answeing the wider question of why log-linear parameters are not something to be looked at - which might just be my impression of the literature - feel free to comment! I can think of three: 1) IPF doesn't need the parameters to do its work and do tests based on loglinear models. The canonical reference is Bishop, Fienberg, and Holland's Discrete Multivariate Analysis, 1975. 2) In many applications, direct inspection of the cell frequencies or their estimates is quite natural. 3) Often there are higher order effects (a four way table with 3 way interactions, say) , so the lower order parameter values are not easily interpreted anyway. HTH, Chuck Thanks for your help! Maja -- View this message in context: http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22090104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indicator or deviation contrasts in log-linear modelling
I realise that in the case of loglin the parameters are clacluated post festum from the cell frequencies, however other programmes that use Newton-Raphson as opposed to IPF work the other way round, right? In which case one would expect the output of parameters to be limited to the particular contrast used. But since loglin uses IPF I would have thought the choice of style of parameter to be output could be made... Anyway, this is the line that interests me: lm( as.vector( loglin(...,fit=TRUE)$fit ) ~ your favored contrasts ) only I'm not profficient in R to figure out the last term :( How would I go about this then if my prefered contrasti is setting the first categories as reference cats? I literaly just need the equivalent of loglin(matrix(c(1,2,3,4), nrow=2), list(c(1,2)), param=TRUE) which would give me parameters under indicator contrast. glm... well, I'd have to work on it Regarding the more general points ad 2) I would have thought that direct inspection of cell frequencies is precisely the wrong/misleading thing to do - the highest order coefficients can be inspected directly in order to see the interaction without the (lower) marginal effects, or alternatively the table can be standardized to uniform margins for the same sort of inspection. ad 3) and yes, I figured as much! I can't see how lower order terms can be interpreted at all if higher order interactions exist? I've seen it done, e.g I've seen it claimed that in a standardized table the lower order terms are all equal to zero, which is of course not true? Thanks! Maja -- View this message in context: http://www.nabble.com/indicator-or-deviation-contrasts-in-log-linear-modelling-tp22090104p22093070.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.