[julia-users] Re: reference for the Formula language used in MixedModels

Douglas Bates Fri, 22 Aug 2014 13:02:02 -0700

On Thursday, August 21, 2014 4:41:02 PM UTC-5, Thomas Covert wrote:
>
> Thanks for the thorough explanation.  To be clear, though, if "f" is a 
> PooledDataFactor, "f" is treated as a fixed effect in the Formula language, 
> whereas "(1|f)" is treated as a random effect?
>

That's correct, except that the name of the type is PooledDataArray. 

I should have been more specific about what are fixed-effects terms and 
what are random-effects terms.  A random-effects term is distinguished by 
the vertical bar, "|". The precedence of operators requires that the 
random-effects expression be enclosed in parentheses.  The expression to 
the left of the vertical bar is evaluated as a model matrix according to 
the rules of the R formula language.  In particular, the "intercept" term, 
which generates a column of 1's, is implicit.  Hence (1+x|g) and (x|g) are 
equivalent and generate random slopes w.r.t. x and random intercepts for 
each level of g.  (Perhaps we should change this but that is the way it is 
now).  To suppress an intercept term you write (0+x|g).

A fixed-effects term, f, where f is a PooledDataVector with k levels, 
generates k-1 "contrast" columns.  (If you know the technical definition of 
a contrast as used in old analysis of variance descriptions these are not 
really contrasts in that sense but that is irrelevant here.)  Again, this 
is because of the implied intercept 1 + f is equivalent to f, and 1+f 
generates k columns consisting of the intercept column and k-1 of the k 
indicator columns for the levels of f.  We drop the first column, again 
following the R convention where the so-called "treatment" contrasts are 
the default.  R allows other contrast specifications. We haven't yet added 
that capability to the formula language in Julia.

It happens that 0+f also generates k columns, which are the full set of 
indicator columns. 

>  Similarly, "f&g" is the cartesian product of the fixed effects for f and 
> g?  
>

A interaction term like f&g is rarely used by itself.  The more common form 
is f*g which expands to the main effects and the second order interaction. 
 That is f*g expands to 1 + f + g + f&g.  In this case f&g is the Cartesian 
product of the contrasts columns.  This corresponds to a two-factor 
analysis of variance with interaction.  If g has l levels then the main 
effects for f have k-1 degrees of freedom, the main effects for g have l-1 
degrees of freedom and the interaction term has (k-1)(l-1) degrees of 
freedom (assuming you have at least k*l observations and the model matrix 
is of full rank).  In these cases "degrees of freedom" means what it should 
mean, the dimension of a linear subspace of the sample space.

> I see that (x|g) has random effects on the slope of x.  Is there a way to 
> get "fixed" slopes on x?
>

The full model expression would be

y ~ 1 + x + (1+x|g)

generating two fixed effects, the population-wide intercept and slope 
w.r.t. x and two random effects (change in the intercept and slope) for 
each level of g.  See the example of the sleepstudy data in the README.md 
for the MixedModels package.

Let's take any further discussion of the model formula language to the 
Julia-Stats group so as not to frighten those who remember with horror 
their introductory statistics course.

>
> On Thursday, August 21, 2014 4:35:05 PM UTC-5, Douglas Bates wrote:
>>
>> On Thursday, August 21, 2014 4:20:46 PM UTC-5, Thomas Covert wrote:
>>>
>>> Is there a reference somewhere for the formula language specified in 
>>> DataFrames and used in MixedModels?  In particular, I'm confused about how 
>>> fixed- and random-effects are separately specified.  For example, suppose 
>>> I've got Y_{it} = X_{it}b + u_i + e_{it}.  My understanding is that a fixed 
>>> effect spec for this is Y ~ X + (1|i).  What is the random effects 
>>> specification?  If I had a separate categorical variable Z, how would I 
>>> write Y_{it} = X_{it}b + {Fixed effects on categories of Z} + u_i + e_{it}, 
>>> with random effects on i?
>>
>>
>> The formula language used in MixedModels is similar to that used in the 
>> lme4 package for R.  There are examples in the README.md file and in the 
>> demo and docs directories of the package.
>>
>> To specify random effects you need to have a factor (PooledDataVector in 
>> Julia parlance) which would correspond to the i subscript in your 
>> specification.  Call this g and the response y.  Then a simple random 
>> effects model is written as
>>
>> y ~ 1 + (1|g)
>>
>> A model with covariates, x (numeric) and f (categorical) is wiitten
>>
>> y ~ x + f + (1|g)
>>
>> A model with these covariates, random slopes and random intercepts is 
>> written
>>
>> y ~ x + f + (x|g)
>>
>> Perhaps it would be best to use the issue tracker in the MixedModels 
>> package if you have further questions of this type.
>>
>>>
>>> Thanks.
>>>
>>> -Thom
>>>
>>

[julia-users] Re: reference for the Formula language used in MixedModels

Reply via email to