[ECOLOG-L] Course:Genome Assembly and Annotation. Berlin

Carlo Pecoraro Mon, 26 Nov 2018 09:53:29 -0800

Dear all,


we are pleased to inform you that we will run the 2nd edition of our "Assembly 
and Annotation of genomes" course from the 11th to the 15th of February 2019, 
in Berlin (Germany)

 

https://www.physalia-courses.org/courses-workshops/course20/

 

 

 

 

Application deadline is: January 10th, 2019.

 

 

 

 

 

Instructor:

 

 

 

Dr. Thomas D. Otto (University of Glasgow, UK; 
https://www.physalia-courses.org/instructors/t28/)

 

 

 

Assistant instructor:

 

 

 

Mr. Maximilian Driller (Begendiv, Germany; http://bit.ly/2zcwmQT)

 

 

 

 

 

Overview

 

 

This course will introduce biologists and bioinformaticians to the concepts of 
de novo assembly and annotation. Different technologies, from Illumina, PacBio, 
Oxford Nanopoor and maybe 10X will be used mixed with different approaches like 
correction, HiC scaffolding to generate good draft assemblies. Particular 
attention will be given to the quality control of the assemblies and to the 
understanding how errors occur. Further, annotation tools using RNA-Seq data 
will be introduced. An outlook of potential analysis is given. In the end of 
the course the students should be able to understand what is needed to generate 
a good annotated genome.

 

 

 

 

 

Targeted Audience & Assumed Background

 

 

The course is aimed at researchers interested in learning more about genome 
assembly and annotation. It will include information useful for both the 
beginner and the more advanced user. We will start by introducing general 
concepts and then continue to step-by-step describe all major components of a 
genome assembly and annotation workflow, from raw data all the way to a final 
assembled and annotated genome. There will be a mix of lectures and hands-on 
practical exercises using command line Linux.

 

Attendees should have a background in biology. We will dedicate one session to 
some basic and advanced Linux concepts. Attendees should have also some 
familiarity with genomic data such as that arising from NGS sequencers.

 

 

 

 

 

Session content

Monday 12th – Classes from 09:30 to 17:30 - “get it starting”

 

Session 1: Introduction (morning)

 

In this session I will kick off with an introduction lecture about genome 
assembly and annotation - the past, the present and the future. I will use this 
introduction to motivate the five-day course. Next, I will explain the use of 
the virtual machine (VM), and the use of cloud computing. This is followed by 
short introduction to Linux. Through the morning we will kick off our first 
assembly and put it through an annotation tool (Companion).

 

 

 

Session 2: Visualization (half afternoon)

 

 During this afternoon, we are going to visualize the assembled and annotation 
genome from this morning in Artemis. The aim is to use the viewer to inspect 
the annotation, correct annotation and write out files. Next, we are going to 
perform a comparative exercise, (comparing the genome from the morning with a 
close reference) to understand the concept of syntheny, breakpoint or errors.

 

 

 

Session 3: Mapping

 

 In this module, I will teach the basics of read mapping. We will map reads 
with bwa mem onto a reference and will examine duplications and errors through 
not proper mapped read pairs. This is important to exanimate the correctness of 
assemblies and will be used later the week.

 

 

 

 

 

Tuesday 13th – Classes from 09:30 to 17:30 -  “learn it the old way”

 

Session 4: De Brujin graph and PAGIT

 

This module is dedicated to short read assembly. Although it might be 
superseded due to long reads, understanding the concept of short reads and De 
Brujin graph is crucial. After a seminar about this subject, we will assemble 
the same genome as before, but this time with Illumina: de novo assembly with 
velvet, contig ordering, error correction. Through comparative genomics we are 
going to look at errors in the assembly, and how they could be found with 
remapping short reads, and also split long reads. Last, we are going to compare 
the assembly to the assembly from Monday. This session will go into the 
afternoon of Tuesday.

 

 

 

Session 5: RNA-Seq

 

In this session, we will analysis the transcriptome of the sample we assembled 
so far, motivated through a little talk. In the exercise, we will map RNA-Seq 
reads, (short and long reads) understanding first the basics of RNA-Seq, but 
then will use the reads to correct gene models. We will discuss the concept of 
alternative splicing.

 

 Finally, we will annotate our assembly with Augustus, using the mapped RNA-Seq 
data and some manually corrected genes.

 

 

 

 

 

Wednesday 14th – Classes from 09:30 to 17:30 - “do it yourself”

 

Session 6: Large genome assembly

 

First we are going to kick off an assembly of a larger genome, and let it run 
in the cloud over the day and the night. It will be important during the day to 
check if the assembly is still running.

 

 

 

Session 7: Group Taks I

 

Group task I: You will get a set of reads (from a random technology) and need 
to generate a draft genome assembly. Due to time restriction, the reads will be 
from a bacterial genome and you need to

 

            Assemble the genome
            Check the assembly
           Annotate the genome

 

This task will be done in groups of 2-3 students. During the day, I will 
motivate some of the new tools, like bacterial annotation, circularization etc. 
through little talks.

 

 

 

 

 

Thursday 15 th – Classes from 09:30 to 17:30 - “apply your knowledge to real 
world example”

 

Session 8: Group Taks I continued

 

First each group will present the results of the group task, quality of the 
assembly, amount of annotated genes, and give an outlook what analysis they 
would do next and why.

 

 

 

Session 9: Large genome assembly continued

 

For the rest of the morning, we are going to analyse the larger genome assembly 
started Wednesday morning. How did it come along? How big is it? What is it? 
How much compute did it take?

 

 

 

Session 10: Group Task II

 

Group task 2, pick 2 projects: During the next 24h, each group will work on two 
different projects. Those range from: Assembly of genomes, size from 1-40MB, 
virus assembly, using HiC data for the assembly, comparing different sequencing 
technologies, comparing different assembly tools, evaluate the quality of 
assemblies, annotate a large genome etc. Each group will pick 2 projects and 
needs to manage the amount of time (what and when to set off running, what will 
you run over night?). Important, students could also analyse their own data as 
a project, if they can convince the other group members that their project is 
interesting.

 

 These project simulates more “real life” examples as compared to the data from 
the exercises the rather raw reads are in the short read archives, some might 
be of bad quality, other are contaminated, etc.

 

 

 

 

 

Friday 16 th – Classes from 09:30 to 17:30

 

Session 11: Group Task II continued

 

We are continuing with the group task.

 

 

 

Session 12: Group Task II presentation

 

In this session, each group will present the results of their work. Each member 
of the groups will have to present parts of the projects. This is followed by a 
general discussion about the group tasks, and the course.

 

 

 

 

 

For more information about the course, please visit our website: 
https://www.physalia-courses.org/courses-workshops/course20/

 

 

 

 

 

Here is the full list of our courses and Workshops: 
https://www.physalia-courses.org/courses-workshops/

 

 

 

 

 

Best regards,

 

Carlo

 

 

 



--------------------

Carlo Pecoraro, Ph.D


Physalia-courses DIRECTOR

i...@physalia-courses.org

http://www.physalia-courses.org/

Twitter: @physacourses

mobile: +49 15771084054

https://groups.google.com/forum/#!forum/physalia-courses

[ECOLOG-L] Course:Genome Assembly and Annotation. Berlin

Reply via email to