[ECOLOG-L] FINAL CALL FOR Introduction to bioinformatics for DNA and RNA sequence analysis

Oliver Hooker Wed, 03 Oct 2018 16:52:07 -0700

FINAL CALL FOR Introduction to bioinformatics for DNA and RNA sequence 
analysis (IBDR01)


https://www.prinformatics.com/course/introduction-to-bioinformatics-for-dna-
and-rna-sequence-analysis-ibdr01/

This course will be delivered by Dr. Malachi Griffith in Glasgow city 
centre from the 29 October- 2 November 2018

Course Overview:
Analysis of high throughput genome and transcriptome data is major 
component of many research projects ranging from large-scale precision 
medicine efforts to focused investigations in model systems. This analysis 
involves the identification of specific genome or transcriptome features 
that predispose individuals to disease, predict response to therapies, 
influence diagnosis/prognosis, or provide mechanistic insights into disease 
models. During this course (IBDR01), students will perform an example end-
to-end bioinformatics analysis of genome (WGS and Exome) and transcriptome 
(RNA-seq) data. Students will start with raw sequence data for a 
hypothetical case, learn to install and use the tools needed to analyze 
this data on the cloud, and visualize and interpret results. After 
completing the course, students should be in a position to (1) understand 
raw sequence data formats, (2) perform bioinformatics analyses on the 
cloud, (3) run complete analysis pipelines for alignment, variant calling, 
annotation, and RNA-seq (transcriptome analysis approaches will be a major 
component of the workshop), (4) visualize and interpret whole genome, exome 
and RNA-seq results, (5) leverage the identification of passenger variants 
for immunotherapy applications, and (6) begin to place these results in a 
clinical context by use of variant knowledgebases. The data, tools, and 
analysis will be most directly relevant to human genomics and 
bioinformatics research. However, many of the skills and concepts covered 
will be applicable to other human diseases and model organisms. 
Furthermore, many analysis concepts covered during the workshop will be 
broadly applicable to other “big data” research problems. All course 
materials (including copies of presentations, practical exercises, data 
files, and example scripts prepared by the instructing team) will be 
provided electronically to participants.

Monday 29th – Classes from 09:30 to 17:30

Session 1. Introduction to genomics and bioinformatics.
In this session, students will be introduced to key concepts of genomics 
and their application to genomics research and precision medicine in 
cancer. An introduction to next-generation sequencing platforms and related 
bioinformatics approaches will also be provided. Core concepts and tools 
introduced: fundamentals of genome and transcriptome analysis, next-
generation sequencing, precision/personalized medicine approaches (using 
cancer as an exemplar disease).

Session 2. Introduction to genomics data, file formats, QC, and cloud 
analysis.
In this session, students will be introduced to a hypothetical patient case 
and related samples to be analyzed throughout the course. Students will be 
provided with an introduction to the whole genome, exome, transcriptome and 
other data sets we have generated for this test case. Information on where 
to get the raw data and how to access it (and other test data) will be 
provided. Using this data as an example, the students will learn 
fundamentals of next generation sequence (NGS) data formats. The students 
will also be introduced to accessory files needed for analysis including 
reference genomes, reference transcriptomes, and annotation files. Tools 
for QC analysis of raw data will be demonstrated. Since most analysis will 
be performed on the cloud, each student will learn how to launch and log 
into their own cloud compute environment. Students will learn how to 
install bioinformatics tools and learn to use some of the most broadly 
useful tool kits for NGS data. Core concepts and tools introduced: file 
formats (Fasta, FastQ, SAM/BAM/CRAM, VCF, GTF), bedtools, Picard, samtools, 
fastQC, cloud computing (AWS, EC2).

Tuesday 30th – Classes from 09:30 to 17:30

Session 3. Primary genome data analysis (sequence alignment and 
visualization).
In this session, we will start to complete analysis of NGS data at the 
command line. Students will log into the cloud, and starting with their own 
copy of the raw data will align the whole genome and exome data to a 
reference genome. Following alignment, students will conduct a second 
quality analysis of the data and learn to visualize alignments in IGV. Core 
concepts and tools introduced: alignment algorithms, reference indexes, 
BWA, BWA-mem, alignment indexes, alignment flags, genome browsers, 
duplicate marking, alignment merging and sorting, IGV.

Session 4. Whole genome and exome variant calling and annotation.
In this session, we will introduce different algorithms for identifying 
sequence variations of various types from either whole genome or exome data 
(or both). Both germline and somatic variant calling will be covered. For 
each, students will learn strategies for identifying false positives and 
increasing confidence in individual predictions by manual or secondary 
examination of the alignments. Variant types detected will include single 
nucleotide variants (SNVs), small insertions and deletions (indels), copy 
number variants (CNVs) and structural variants (SVs). Students will learn 
strategies for visualizing and presenting variants of each type. After 
producing filtered variant results of each type, annotation methods and 
resources relevant to each variant type will be demonstrated. Core concepts 
and tools introduced: germline variation, somatic variation, variant 
calling, false positives, false negatives, alignment artifacts, manual 
review, svviz, manta, GATK, Strelka, MuTect, VarScan, CopyCat, Lumpy.

Wednesday 31st – Classes from 09:30 to 17:30

Session 5. RNA-seq analysis (introduction, alignment and abundance 
estimation).
In this session, students will learn about fundamentals of RNA-seq data 
analysis and perform initial QC and alignment of the raw transcriptome 
data. Appropriate sample comparisons for RNA-seq and other experimental 
design and analysis considerations will be discussed in detail. Core 
concepts and tools introduced: reference transcriptomes, normalization, 
batch effects, replicates, spliced alignment algorithms, RNA-seq data 
trimming, RNA assembly algorithms, RNASeqQC, HISAT, StringTie.

Session 6. RNA-seq analysis (fusions, differential expression, and 
clustering).
The uses of transcriptome data in biological research are remarkably 
varied. Students will pursue several strategies in this section. Fusion 
detection, an RNA-seq specific variant detection approach will be 
performed. The expression abundance results from the previous section will 
be used to identify a list of highly expressed genes. Comparison to RNA-seq 
data from a cohort of related samples will be used to identify expression 
outliers. Expression clustering algorithms will be used to stratify our 
case using a known expression signature. More advanced classification and 
pathway based approaches to stratification will be briefly introduced. Core 
concepts and tools introduced: fusion calling, outlier analysis, expression 
clustering, stratification, heatmaps, Ballgown, pizzly.

Thursday 1st – Classes from 09:30 to 17:30

Session 7. Prioritization, visualization and interpretation.
In this session, students will learn about procedures for refining the 
final results obtained from the previous analyses of our case data. Genome 
and transcriptome variant observations will be prioritized according to 
various annotation strategies. These vary from algorithmic predictions of 
pathogenicity to intersecting with results from population databases. 
Students will also learn how to integrate results from the DNA and RNA-seq 
analyses. For example, variants will be prioritized according to their 
expression status, allele specific expression bias, and the abundance of 
associated genes. Fusions predicted in the RNA will be confirmed in the 
DNA. Visualization techniques will be used to place variant observations 
from our case in the context of a cohort of previously sequenced cases with 
the same disease. A group discussion will tackle how to approach creating a 
final clinical interpretation for our example patient. Core concepts and 
tools introduced: allele specific expression, clonality, GenVisR, gnomad, 
CADD, bam-readcount, integrate.

Session 8. Gene/variant knowledgebases and clinical actionability.
In this session, students will learn the fundamentals of interpreting 
genome and transcriptome observations in a clinical context. The final 
candidate observations for our example case will be examined using various 
clinical interpretation tools and databases. Core concepts and tools 
introduced: Druggability, actionability, sensitivity, resistance, 
predictive variants, diagnostic variants, prognostic variants, predisposing 
variants, the ACMG and AMP guidelines for clinical actionability, variant 
knowledgebases, CBioPortal, CIViC, ClinVar, DGIdb, PharmGKB.

Friday 2nd – Classes from 09:30 to 16:00

Session 9. Leveraging passenger variants (monitoring and immunogenomics).
Up until this point, we have been focused on identifying, annotating and 
interpreting variants that are potentially relevant to disease biology or 
clinical interpretation. These are variants that are deemed functional, 
actionable, or of some known clinical relevance. What about those variants 
that may be unusual or unique to this case but of no known significance? 
What about the “passenger” variants? In this section, we will explore two 
broad strategies that leverage passenger variants in a clinically useful 
way (using cancer as an exemplar disease for this approach). First, we will 
examine their potential use in tracking response to therapy. Second, we 
will explore the possible immunogenomic implication of passenger variants 
by designing a personalized cancer vaccine for our example case. Core 
concepts and tools introduced: cfDNA, serial analysis, immunotherapy, 
pVacTools.

Session 10. Application to your own data
Optional free afternoon to cover previous modules or consult with the team 
of instructors. In this session, students will be free to work on their 
own, or in groups on the previously covered sections. Furthermore, students 
can consult with the team of instructors on their own experiments or get 
practical advice for analyzing their own data. Our hope is to make this 
session as interactive and useful as possible.

To learn more about the team of instructors, please visit 
www.griffithlab.org and 
http://genome.wustl.edu/people/groups/detail/griffith-lab/.

Email [email protected]

Check out our sister sites,
www.PRstatistics.com (Ecology and Life Sciences)
www.PRinformatics.com (Bioinformatics and data science)
www.PSstatistics.com (Behaviour and cognition) 


1.    October 8th – 12th 2018
INTRODUCTION TO FREQUENTIST AND BAYESIAN MIXED (HIERARCHICAL) MODELS 
(IFBM01)
Glasgow, Scotland, Dr Andrew Parnell
https://www.psstatistics.com/course/introduction-to-frequentis-and-bayesian-
mixed-models-ifbm01/

2.    October 15th – 19th 2018
APPLIED BAYESIAN MODELLING FOR ECOLOGISTS AND EPIDEMIOLOGISTS (ABME04)
Glasgow, Scotland, Dr. Matt Denwood, Emma Howard
http://www.prstatistics.com/course/applied-bayesian-modelling-ecologists-
epidemiologists-abme04/

3.    October 23rd – 25th 2018
INTRODUCTIUON TO R (This is a private ‘in-house’ course)
London, England, Dr William Hoppitt

4.    October 29th – November 2nd 2018
INTRODCUTION TO R AND STATISTICS FOR BIOLOGISTS (IRFB02)
Glasgow, Scotland, Dr. Olivier Gauthier
https://www.prstatistics.com/course/introduction-to-statistics-and-r-for-
biologists-irfb02/

5.    October 29th – November 2nd 2018
INTRODUCTION TO BIOINFORMATICS FOR DNA AND RNA SEQUENCE ANALYSIS (IBDR01)
Glasgow, Scotland, Dr Malachi Griffith, Dr. Obi Griffith
www.prinformatics.com/course/precision-medicine-bioinformatics-from-raw-
genome-and-transcriptome-data-to-clinical-interpretation-pmbi01/

6.    November 5th – 8th  2018
PHYLOGENETIC COMPARATIVE METHODS FOR STUDYING DIVERSIFICATION AND 
PHENOTYPIC EVOLUTION (PCME01)
Glasgow, Scotland, Dr. Antigoni Kaliontzopoulou
https://www.prstatistics.com/course/phylogenetic-comparative-methods-for-
studying-diversification-and-phenotypic-evolution-pcme01/

7.    November 19th – 23rd  2018
STRUCTUAL EQUATION MODELLING FOR ECOLOGISTS AND EVOLUTIONARY BIOLOGISTS 
(SEMR02)
Glasgow, Scotland, Dr. Jonathan Lefcheck
https://www.prstatistics.com/course/structural-equation-modelling-for-
ecologists-and-evolutionary-biologists-semr02/

8.    November 26th – 30th 2018
FUNCTIONAL ECOLOGY FROM ORGANISM TO ECOSYSTEM: THEORY AND COMPUTATION 
(FEER01)
Glasgow, Scotland, Dr. Francesco de Bello, Dr. Lars Götzenberger, Dr. 
Carlos Carmona
http://www.prstatistics.com/course/functional-ecology-from-organism-to-
ecosystem-theory-and-computation-feer01/

9.    December 3rd – 7th 2018
INTRODUCTION TO BAYESIAN DATA ANALYSIS FOR SOCIAL AND BEHAVIOURAL SCIENCES 
USING R AND STAN (BDRS01)
Glasgow, Dr. Mark Andrews
https://www.psstatistics.com/course/introduction-to-bayesian-data-analysis-
for-social-and-behavioural-sciences-using-r-and-stan-bdrs01/

10.    January 21st – 25th 2019
STATISTICAL MODELLING OF TIME-TO-EVENT DATA USING SURVIVAL ANALYSIS: AN 
INTRODUCTION FOR ANIMAL BEHAVIOURISTS, ECOLOGISTS AND EVOLUTIONARY 
BIOLOGISTS (TTED01)
Glasgow, Scotland, Dr. Will Hoppitt
https://www.psstatistics.com/course/statistical-modelling-of-time-to-event-
data-using-survival-analysis-tted01/

11.    January 21st – 25th 2019
ADVANCING IN STATISTICAL MODELLING USING R (ADVR08)
Glasgow, Scotland, Dr. Luc Bussiere, Dr. Tom Houslay
http://www.prstatistics.com/course/advancing-statistical-modelling-using-r-
advr08/

12.    January 28th–  February 1st 2019
AQUATIC ACOUSTIC TELEMETRY DATA ANALYSIS AND SURVEY DESIGN
Glasgow, Scotland, VEMCO staff and affiliates
https://www.prstatistics.com/course/aquatic-acoustic-telemetry-data-
analysis-atda01/

13.    February  4th – 8th 2019
DESIGNING RELIABLE AND EFFICIENT EXPERIMENTS FOR SOCIAL SCIENCES (DRES01) 
Glasgow, Scotland, Dr. Daniel Lakens
https://www.psstatistics.com/course/designing-reliable-and-effecient-
experiments-for-social-sciences-dres01/

14.    February 11th – 15th 2019
REPRODUCIBLE DATA SCIENCE FOR POPULATION GENETICS
Glasgow, Scotland, Dr. Thibaut Jombart, Dr. Zhain Kamvar
https://www.prstatistics.com/course/reproducible-data-science-for-
population-genetics-rdpg02/

15.    25th February – 1st March 2019
MOVEMENT ECOLOGY (MOVE02)
Margam Discovery Centre, Wales, Dr. Luca Borger, Prof. Ronny Wilson, Dr 
Jonathan Potts
https://www.prstatistics.com/course/movement-ecology-move02/

16.    March 4th – 8th 2019
BIOACUSTIC DATA ANALYSIS
Glasgow, Scotland, Dr. Paul Howden-Leach 
https://www.prstatistics.com/course/bioacoustics-for-ecologists-hardware-
survey-design-and-data-analysis-biac01/

17.    March 11th – 15th  2019
ECOLOGICAL NICHE MODELLING USING R (ENMR03)
Glasgow, Scotland, Dr. Neftali Sillero
http://www.prstatistics.com/course/ecological-niche-modelling-using-r-
enmr03/

18.    March 18th – 22nd 2019
INTRODUCTION TO STATISTICS AND R FOR EVERYONE (IRFE01)
Crete, Greece, Dr Aristides (Aris) Moustakas
https://www.prstatistics.com/course/introduction-to-statistics-and-r-for-
anyone-irfe01/

19.    March 25th – 29th 2019
LANDSCAPE GENETIC DATA ANALYSIS USING R (LNDG03)
Glasgow, Scotland, Prof. Rodney Dyer
http://www.prstatistics.com/course/landscape-genetic-data-analysis-using-r-
lndg03/

20.    April 1st – 5th 2019
INTRODUCTION TO STATISTICAL MODELLING FOR PSYCHOLOGISTS USING R (IPSY01)
Glasgow, Scotland, Dr. Dale Barr, Dr Luc Bussierre   
http://www.psstatistics.com/course/introduction-to-statistics-using-r-for-
psychologists-ipsy02/

21.    April 1st – 5th 2019
INDIVIDUAL BASED MODELS FOR ECOLOGSITS (IBME01)
Glasgow Scotland, Dr Aristides (Aris) Moustakas
Link to follow

22.    April 8th – 12th 2019
MACHINE LEARNING 
Glasgow Scotland, Dr Aristides (Aris) Moustakas
https://www.prstatistics.com/course/machine-learning-using-r-mlur01/

23.    April 8th – 12th 2019
Spatial modelling, analysis and statistical inference of genomic data 
(SMAG01)
Crete, Greece, Dr Matt Fitzpatrick
https://www.prstatistics.com/course/spatial-modelling-analysis-and-
statistical-inference-of-genomic-data-smag01/

24.    May 6th – 10th 2019
MARK RECAPTURE METHODS AND DATA ANALYSIS FOR ECOLOGISTS (MRKR01)
Myuna Bay, Australia, TBC

25.    May 16th – 18th 2019 (please note this a 3-day course from Thursday 
to Saturday)
Aquatic movement ecology using R (AMER01) 
Myuna Bay, Australia, TBC

26.    May 16th – 19th 2019 (please note this a 4-day course from Thursday 
to Monday)
Introduction to R for everyone (IRFE02)
Myuna Bay, Australia, Dr Aristides (Aris) Moustakas

27.    May 20th – 24th 2019
MODEL BASE MULTIVARIATE ANALYSIS OF ABUNDANCE DATA USING R (MBMV03)
Myuna Bay, Australia, Prof. David Warton
https://www.prstatistics.com/course/model-based-multivariate-analysis-of-
abundance-data-using-r-mbmv03/

28.    May 21st – 24th 2019
A statistical tool box for ecologists (STBE01
Myuna Bay, Australia, Dr Aristides (Aris) Moustakas

29.    June 10th – 14th 2019
STABLE ISOTOPE MIXING MODELS USING SIAR, SIBER AND MIXSIAR (SIMM04)
Glasgow, Scotland, Dr. Andrew Parnell, Dr. Andrew Jackson 
www.prstatistics.com/course/stable-isotope-mixing-models-using-r-simm04/

30.    June 17th – 21st 2019
INTRODUCTION TO PYTHON FOR BIOLOGISTS (IPYB06)
Glasgow, Scotland, Dr. Martin Jones
http://www.prinformatics.com/course/introduction-to-python-for-biologists-
ipyb06/

31.    June 24th – 28th 2019
ADVANCED PYTHON FOR BIOLOGISTS (APYB03)
Glasgow, Scotland, Dr. Martin Jones
www.prinformatics.com/course/advanced-python-biologists-apyb03/

32.    July 1st – 5th 2019
DATA VISUALISATION AND MANIPULATION USING PYTHON (DVMP01)
Glasgow, Scotland, Dr. Martin Jones
http://www.prinformatics.com/course/data-visualisation-and-manipulation-
using-python-dvmp01/

33.    October 7th – 11th 2019
CONSERVATION PLANNING USING PRIORITIZR : FROM THEORY TO PRACTICE (PRTZ01)
Crete, Greece, Dr Richard Schuster and Nina Morell
https://www.prstatistics.com/course/conservation-planning-using-prioritizr-
from-theory-to-practice-prtz01/

34.    October 21st – 25th 2019
A COMPLETE GUIDE TO MIXED MODELS (INCLUDING TEMPORAL AND SPATIAL 
AUTOCORRELATION) (MMTS01) 
Crete, Greece, Dr Aristides (Aris) Moustakas
https://www.prstatistics.com/course/a-complete-guide-to-mixed-models-
including-temporal-and-spatial-autocorrelation-mmts01/

[ECOLOG-L] FINAL CALL FOR Introduction to bioinformatics for DNA and RNA sequence analysis

Reply via email to