On Thursday, August 21, 2014 4:41:02 PM UTC-5, Thomas Covert wrote:
>
> Thanks for the thorough explanation. To be clear, though, if "f" is a
> PooledDataFactor, "f" is treated as a fixed effect in the Formula language,
> whereas "(1|f)" is treated as a random effect?
>
That's correct, except that the name of the type is PooledDataArray.
I should have been more specific about what are fixed-effects terms and
what are random-effects terms. A random-effects term is distinguished by
the vertical bar, "|". The precedence of operators requires that the
random-effects expression be enclosed in parentheses. The expression to
the left of the vertical bar is evaluated as a model matrix according to
the rules of the R formula language. In particular, the "intercept" term,
which generates a column of 1's, is implicit. Hence (1+x|g) and (x|g) are
equivalent and generate random slopes w.r.t. x and random intercepts for
each level of g. (Perhaps we should change this but that is the way it is
now). To suppress an intercept term you write (0+x|g).
A fixed-effects term, f, where f is a PooledDataVector with k levels,
generates k-1 "contrast" columns. (If you know the technical definition of
a contrast as used in old analysis of variance descriptions these are not
really contrasts in that sense but that is irrelevant here.) Again, this
is because of the implied intercept 1 + f is equivalent to f, and 1+f
generates k columns consisting of the intercept column and k-1 of the k
indicator columns for the levels of f. We drop the first column, again
following the R convention where the so-called "treatment" contrasts are
the default. R allows other contrast specifications. We haven't yet added
that capability to the formula language in Julia.
It happens that 0+f also generates k columns, which are the full set of
indicator columns.
> Similarly, "f&g" is the cartesian product of the fixed effects for f and
> g?
>
A interaction term like f&g is rarely used by itself. The more common form
is f*g which expands to the main effects and the second order interaction.
That is f*g expands to 1 + f + g + f&g. In this case f&g is the Cartesian
product of the contrasts columns. This corresponds to a two-factor
analysis of variance with interaction. If g has l levels then the main
effects for f have k-1 degrees of freedom, the main effects for g have l-1
degrees of freedom and the interaction term has (k-1)(l-1) degrees of
freedom (assuming you have at least k*l observations and the model matrix
is of full rank). In these cases "degrees of freedom" means what it should
mean, the dimension of a linear subspace of the sample space.
> I see that (x|g) has random effects on the slope of x. Is there a way to
> get "fixed" slopes on x?
>
The full model expression would be
y ~ 1 + x + (1+x|g)
generating two fixed effects, the population-wide intercept and slope
w.r.t. x and two random effects (change in the intercept and slope) for
each level of g. See the example of the sleepstudy data in the README.md
for the MixedModels package.
Let's take any further discussion of the model formula language to the
Julia-Stats group so as not to frighten those who remember with horror
their introductory statistics course.
>
> On Thursday, August 21, 2014 4:35:05 PM UTC-5, Douglas Bates wrote:
>>
>> On Thursday, August 21, 2014 4:20:46 PM UTC-5, Thomas Covert wrote:
>>>
>>> Is there a reference somewhere for the formula language specified in
>>> DataFrames and used in MixedModels? In particular, I'm confused about how
>>> fixed- and random-effects are separately specified. For example, suppose
>>> I've got Y_{it} = X_{it}b + u_i + e_{it}. My understanding is that a fixed
>>> effect spec for this is Y ~ X + (1|i). What is the random effects
>>> specification? If I had a separate categorical variable Z, how would I
>>> write Y_{it} = X_{it}b + {Fixed effects on categories of Z} + u_i + e_{it},
>>> with random effects on i?
>>
>>
>> The formula language used in MixedModels is similar to that used in the
>> lme4 package for R. There are examples in the README.md file and in the
>> demo and docs directories of the package.
>>
>> To specify random effects you need to have a factor (PooledDataVector in
>> Julia parlance) which would correspond to the i subscript in your
>> specification. Call this g and the response y. Then a simple random
>> effects model is written as
>>
>> y ~ 1 + (1|g)
>>
>> A model with covariates, x (numeric) and f (categorical) is wiitten
>>
>> y ~ x + f + (1|g)
>>
>> A model with these covariates, random slopes and random intercepts is
>> written
>>
>> y ~ x + f + (x|g)
>>
>> Perhaps it would be best to use the issue tracker in the MixedModels
>> package if you have further questions of this type.
>>
>>>
>>> Thanks.
>>>
>>> -Thom
>>>
>>