Introduction to Mixed (Hierarchical) models for biologists using R (IMBR01)

Delivered by Prof. Subhash Lele

Introduction to Mixed (Hierarchical) models for biologists using R (IMBR01)

This 5 day course will run from 14th - 18th May 2018 at Orford Musique, 
Orford, Quebec, Canada.

Mixed models, also known as hierarchical models and multilevel models, is a 
useful class of models for many applied sciences, including biology, 
ecology and evolution. The goal of this course is to give a thorough 
introduction to the logic, theory and most importantly implementation of 
these models to solve practical problems in ecology. Participants are not 
expected to know mathematics beyond the basic algebra and calculus. 
Participants are expected to know some R programming and to be familiar 
with the linear and generalized linear regression. We will be using JAGS 
(Just Another Gibbs Sampler) for Markov Chain Monte Carlo (MCMC) 
simulations for analyzing mixed models. The course will be conducted so 
that participants have substantial hands-on experience.


Course content is as follows

Monday 14th
Linear and Generalized linear models

To understand mixed models, the most important first step is to 
thoroughly 
understand the linear and generalized linear models. Also, when conducting 
the data analysis, it is useful to fit a simpler fixed 
effects model before 
trying to fit a more complex mixed effects model. Hence, we will 
start with 
a very detailed review of these models. We are assuming that the 
participants are familiar with these models and hence we will emphasize 
some important, but not commonly covered, topics. This will also give us an 
opportunity to unify the notation, review the basic R commands and fill 
out 
any gaps in knowledge and understanding of these topics.
1. We will show the use of non-parametric exploratory techniques such as 
classification and regression trees (CART) for learning about important 
covariates and possible non-linearities in the relationships.
2. We will emphasize graphical and simulation based methods (e.g. Gelman 
and Hill, 2006) to understand and explore the implications of the fitted 
model.
3. We will discuss graphical tools such as marginal and conditional plots 
that are useful for conveying the results of a multiple regression model to 
a lay person.
4. We will emphasize the use of graphical tools to conduct regression 
diagnostics and appropriateness of the model.
5. We will discuss the important concepts of confounding, effect 
modification and interaction. These are particularly important to conduct 
causal, not just correlational, inference using observational studies.

Tuesday 15th – Classes from 09:00 to 17:00
Computational inference

Many of the topics that will be covered involve the use of matrix algebra 
and calculus. While these mathematical techniques are essential tools for a 
mathematical statistician who is trying to understand the theory behind the 
methods, they can be avoided in practice by using simulation based 
techniques. The built-in functions such as the ’lm’ and ’glm’ to fit the 
regression models use the method of maximum likelihood to estimate the 
parameters and conduct statistical inference. We will discuss the use of 
JAGS (Just Another Gibbs Sampler) and the R package ’dclone’ to fit the 
same 
models. We will use a different statistical philosophy, namely the 
Bayesian 
inference, to fit these models. We will show how the Bayesian approach 
can 
be tricked into giving frequentist answers using data cloning (Lele et al. 
2007, Ecology Letters). We will also discuss the rudiments of frequentist 
and Bayesian inference although we will not go into the pros and cons of 
them at this time. That will be covered during sessions 3 and 4 of the 
fifth 
day (and, over beer afterwards).
1. What makes an inference statistical inference?
2. What do we mean by probability of an event?
3. How do we quantify uncertainty in an inferential statement in the 
frequentist framework?
4. How do we quantify uncertainty in an inferential statement in the 
Bayesian framework?
We will then discuss the simulation based methods to quantify uncertainty.
1. Parametric bootstrap to quantify frequentist uncertainty
2. Markov Chain Monte Carlo to quantify Bayesian uncertainty
3. Fitting LM and GLM using JAGS and Bayesian approach

Wednesday 16th – Classes from 09:00 to 17:00
Linear Mixed Models

Historically, linear mixed models arose in the study of quantitative 
genetics and heritability issues. They were successfully applied in animal 
breeding and led to the ’white’ revolution with abundance of milk supply 
for the developing world. They were, also, used in horse racing and other 
such fun areas. The other situation where linear mixed effects models 
were 
developed were in the context of growth curves. We will follow this 
historical trajectory of mixed models, paying tribute to the great 
statisticians R. A. Fisher, C. R. Rao and Jerzy Neyman, and study linear 
mixed models first. The questions they tried to solve were: Deciding the 
genetic value of a sire and/or a dam, studying heritability of traits, 
studying co-evolution of traits etc. These can be answered provided we 
assume that the sires and dams in our experiment or sample are merely a 
sample from a super-population of sires and dams. In growth curve analysis, 
we need to take into account that each individual is unique in its own way 
but is also a part of a population. How do we discuss both individual level 
and population inferences? In modern times, linear mixed effects models 
have arisen in the context of small area estimation in survey sampling 
where one is interested in inferring about a census tract based on county 
or state level data. These models arise also in the context of combining 
remote sensed data from different resolutions and types. The main issues 
that we will be discussing are:
1. What is a random effect? What is a fixed effect? How do we 
decide if an 
effect is random or fixed?
2. How do we modify a linear regression model to accommodate random 
effects?
3. Why bother fitting a mixed effects models? What do we gain?
4. How to modify the JAGS linear models program to fit a linear mixed 
effects model using JAGS?
5. What is the difference between a Bayesian and a frequentist inference?
6. What is a prior? What is a non-informative prior?
7. How do we interpret the results of a linear mixed effects model 
fit? 
Graphical and simulation based methods
8. How do we do model selection with mixed effects models?
9. How do we do model diagnostics in mixed effects models?
10. Parameter identifiabilty issues in linear mixed models
As we discuss these applications, we will discuss some subtle computational 
issues involved in using MCMC. In my recollection (which may be biased as 
it has been about 25 years since the quote), Daryl Pregibon said: MCMC is 
the crack cocaine of modern statistics; it is addictive, seductive and 
destructive. Hence, it is important for a practitioner to understand these 
issues in order not to misuse the MCMC technique.
1. What is a Markov Chain Monte Carlo method? Why is it necessary for mixed 
models?
2. What are the subtleties in implementing MCMC?: Convergence of the 
algorithm, Mixing of the chains.
3. Pros and cons of using MCMC

Thursday 17th – Classes from 09:00 to 17:00
Generalised Linear Mixed Models

We will again start the discussion of GLMM in its historical context. One 
of the initial uses of mixed models were in the context of over dispersion 
in count data. Zero inflated count data was another important example. 
The 
example that drove the current revolution in the use of GLMM was in the 
context of spatial epidemiology. Clayton and Caldor (1989, Biometrics) 
showed that one can use spatial correlation to improve the prediction in 
mapping disease rates. This was also an example of the application of 
Empirical Bayes methods that allow one to pool information from different 
spatial areas (or, studies, or, scales, and so on).
1. Zero inflated data In many practical situations, we observe that there 
are many locations where there are zero counts, far in excess of what would 
be expected under the Poisson regression model. This can be effectively 
modelled using a mixed model framework. The mixed models framework allows 
us to use much more complex and realistic models.
2. Over dispersion in GLM, Spatial GLM, Spatio-temporal GLM The Poisson 
regression model assumes that the mean and variance are equal. This is, 
often, not true in practice. Generally the variance in the data exceeds the 
mean. One can show that such over-dispersion can be modelled using a mixed 
effects model. These models also arise in the context of capturerecapture 
sampling where capture probabilities vary across space or time or 
individuals.
3. Longitudinal or panel data with discrete response variable Many times we 
have data on different individuals where within the individual there is 
temporal dependence but individuals are independent of each other. Cluster 
sampling is another situation where we have dependence within a cluster but 
independence between clusters. Such data needs to take into account the 
innate variation between individuals before one can discuss the effect of 
interesting covariates or risk factors. Such data are effectively 
modelled 
as GLMM.
4. Measurement error, missing data Missing data and measurement error are 
ubiquitous in ecological studies. Mixed models provide a convenient way to 
take into account these difficulties and infer about the underlying 
processes of interest. We will discuss these issues in the context of 
Population Viability Analysis, Spatial population dynamics and source-sink 
analysis, Occupancy and abundance surveys. These also arise while doing 
usual linear and generalized linear models if the covariates are measured 
with error.
5. Additional topics depending on the interest of the participants. These 
may include, for example, discussion of Species Distribution Models, 
Resource Selection Functions and Animal movement models.
6 Computational issues: Advanced topics

Friday 18th – Classes from 09:00 to 17:00
Mixed Models in a Bayesian Framework

MCMC is not the only approach to analyse mixed models. We will briefly 
discuss Laplace approximation based techniques (INLA, in particular) along 
with approximate techniques such as Composite likelihood and Approximate 
Bayesian Computation. Because of the mathematical nature, this discussion 
will be somewhat limited, only giving the basics and hinting at the 
important issues.
7 Philosophical issues: Sophie’s choice
1. What are the philosophical problems with using the frequentist 
quantification of uncertainty?
2. What are the philosophical problems with using the Bayesian 
quantification of uncertainty?
3. Sophie’s choice?

Please email any inquiries to [email protected] or visit our 
website www.prstatistics.com

Please feel free to distribute this material anywhere you feel is suitable.

Other upcoming courses

1.      November 6th – 10th 2017
LANDSCAPE GENETIC DATA ANALYSIS USING R #LNDG
Margam Discovery Centre, Wales, Prof. Rodney Dyer
http://www.prstatistics.com/course/landscape-genetic-data-analysis-using-r-
lndg02/

2.      November 20th - 25th 2017
APPLIED BAYESIAN MODELLING FOR ECOLOGISTS AND EPIDEMIOLOGISTS #ABME
SCENE, Scotland, Dr. Matt Denwood
http://www.prstatistics.com/course/applied-bayesian-modelling-ecologists-
epidemiologists-abme03/

3.      November 27th – December 1st 2017
INTRODUCTION TO PYTHON FOR BIOLOGISTS #IPYB
Margam Discovery Centre, Wales, Dr. Martin Jones
http://www.prinformatics.com/course/introduction-to-python-for-biologists-
ipyb04/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------
4.      December 4th - 8th 2017
ADVANCING IN STATISTICAL MODELLING USING R #ADVR
Margam Discovery Centre, Wales, Dr. Luc Bussiere, Dr. Tom Houslay, Dr. Ane 
Timenes Laugen,
http://www.prstatistics.com/course/advancing-statistical-modelling-using-r-
advr07/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------
5.      January 29t – February 2nd 2018
INTRODUCTION TO BAYESIAN HIERARCHICAL MODELLING #IBHM
SCENE, Scotland, Dr. Andrew Parnell
http://www.prstatistics.com/course/introduction-to-bayesian-hierarchical-
modelling-using-r-ibhm02/

6.      January 29th – February 2nd 2018
PHYLOGENETIC DATA ANALYSIS USING R #PHYL
SCENE, Scotland, Dr. Emmanuel Paradis
https://www.prstatistics.com/course/introduction-to-phylogenetic-analysis-
with-r-phyg-phyl02/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

7.      February 19th – 23rd 2018
MOVEMENT ECOLOGY #MOVE
Margam Discovery Centre, Wales, Dr Luca Borger, Dr Ronny Wilson, Dr 
Jonathan Potts
https://www.prstatistics.com/course/movement-ecology-move01/

8.      February 19th – 23rd 2018
GEOMETRIC MORPHOMETRICS USING R #GMMR
Margam Discovery Centre, Wales, Prof. Dean Adams, Prof. Michael Collyer, 
Dr. Antigoni Kaliontzopoulou
http://www.prstatistics.com/course/geometric-morphometrics-using-r-gmmr01/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

9.      March 5th - 9th 2018
SPATIAL PRIORITIZATION USING MARXAN #MRXN
Margam Discovery Centre, Wales, Jennifer McGowan   
https://www.prstatistics.com/course/introduction-to-marxan-mrxn01/

10.     March 12th - 16th 2018
ECOLOGICAL NICHE MODELLING USING R #ENMR
Glasgow, Scotland, Dr. Neftali Sillero
http://www.prstatistics.com/course/ecological-niche-modelling-using-r-
enmr02/

11.     March 19th – 23rd 2018
BEHAVIOURAL DATA ANALYSIS USING MAXIMUM LIKLIHOOD IN R #BDML
Glasgow, Scotland, Dr William Hoppitt
http://www.psstatistics.com/course/behavioural-data-analysis-using-maximum-
likelihood-bdml01/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

12.     April 9th – 13th 2018 
NETWORK ANAYLSIS FOR ECOLOGISTS USING R #NTWA
Glasgow, Scotland, Dr. Marco Scotti   
https://www.prstatistics.com/course/network-analysis-ecologists-ntwa02/

13.     April 16th – 20th 2018
INTRODUCTION TO STATISTICAL MODELLING FOR PSYCHOLOGISTS USING R #IPSY
Glasgow, Scotland, Dr. Dale Barr, Dr Luc Bussierre   
http://www.psstatistics.com/course/introduction-to-statistics-using-r-for-
psychologists-ipsy01/

14.     April 23rd – 27th 2018
MULTIVARIATE ANALYSIS OF ECOLOGICAL COMMUNITIES USING THE VEGAN PACKAGE 
#VGNR
Glasgow, Scotland, Dr. Peter Solymos, Dr. Guillaume Blanchet             
https://www.prstatistics.com/course/multivariate-analysis-of-ecological-
communities-in-r-with-the-vegan-package-vgnr01/

15.     April 30th – 4th May 2018
QUANTITATIVE GEOGRAPHIC ECOLOGY: MODELING GENOMES, NICHES, AND COMMUNITIES 
#QGER
Glasgow, Scotland, Dr. Dan Warren, Dr. Matt Fitzpatrick
https://www.prstatistics.com/course/quantitative-geographic-ecology-using-r-
modelling-genomes-niches-and-communities-qger01/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

16.     May 7th – 11th 2018 ADVANCES IN MULTIVARIATE ANALYSIS OF SPATIAL 
ECOLOGICAL DATA USING R #MVSP
CANADA (QUEBEC), Prof. Pierre Legendre, Dr. Guillaume Blanchet
https://www.prstatistics.com/course/advances-in-spatial-analysis-of-
multivariate-ecological-data-theory-and-practice-mvsp03/

17.     May 14th - 18th 2018
INTRODUCTION TO MIXED (HIERARCHICAL) MODELS FOR BIOLOGISTS #IMBR
CANADA (QUEBEC), Prof Subhash Lele 
https://www.prstatistics.com/course/introduction-to-mixed-hierarchical-
models-for-biologists-using-r-imbr01/

18.     May 21st - 25th 2018
INTRODUCTION TO PYTHON FOR BIOLOGISTS #IPYB
SCENE, Scotland, Dr. Martin Jones
http://www.prinformatics.com/course/introduction-to-python-for-biologists-
ipyb05/

19.     May 21st - 25th 2018
INTRODUCTION TO REMOTE SENISNG AND GIS FOR ECOLOGICAL APPLICATIONS
Glasgow, Scotland, Prof. Duccio Rocchini, Dr. Luca Delucchi
https://www.prinformatics.com/course/introduction-to-remote-sensing-and-gis-
for-ecological-applications-irms01/

20.     May 28th – 31st 2018
STABLE ISOTOPE MIXING MODELS USING SIAR, SIBER AND MIXSIAR #SIMM
CANADA (QUEBEC) Dr. Andrew Parnell, Dr. Andrew Jackson 
https://www.prstatistics.com/course/stable-isotope-mixing-models-using-r-
simm04/

21.     May 28th – June 1st 2018
ADVANCED PYTHON FOR BIOLOGISTS #APYB
SCENE, Scotland, Dr. Martin Jones
https://www.prinformatics.com/course/advanced-python-biologists-apyb02/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

22.     June 12th -0 15th 2018
SPECIES DISTRIBUTION MODELLING #DBMR
Myuna Bay sport and recreation, Australia, TBC
COMING SOON  www.PRstatistics.com

23.     June 12th – 15th 2018
MARK RECAPTURE METHODS IN ECOLOGY #MKRC
Myuna Bay sport and recreation, Australia, TBC
COMING SOON  www.PRstatistics.com

24.     June 18th – 22nd 2018
STRUCTURAL EQUATION MODELLING FOR ECOLOGISTS AND EVOLUTIONARY BIOLOGISTS 
USING R #SEMR
Myuna Bay sport and recreation, Australia, TBC
COMING SOON  www.PRstatistics.com
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

25.     July 2nd - 5th 2018
SOCIAL NETWORK ANALYSIS FOR BEHAVIOURAL SCIENTISTS USING R #SNAR
Glasgow, Scotland, Prof James Curley
http://www.psstatistics.com/course/social-network-analysis-for-behavioral-
scientists-snar01/

26.     July 8th – 12th 2018
MODEL BASE MULTIVARIATE ANALYSIS OF ABUNDANCE DATA USING R #MBMV
Glasgow, Scotland, Prof David Warton
https://www.prstatistics.com/course/model-base-multivariate-analysis-of-
abundance-data-using-r-mbmv02/

27.     July 16th – 20th 2018
PRECISION MEDICINE BIOINFORMATICS: FROM RAW GENOME AND TRANSCRIPTOME DATA 
TO CLINICAL INTERPRETATION #PMBI
Glasgow, Scotland, Dr Malachi Griffith, Dr. Obi Griffith
COMING SOON www.prinformatics.com

28.     July 23rd – 27th 2018
EUKARYOTIC METABARCODING
Glasgow, Scotland, Dr. Owen Wangensteen
http://www.prinformatics.com/course/eukaryotic-metabarcoding-eukb01/
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------

Reply via email to