Machine learning approaches for understanding the genetic basis of complex 
traits

Friday, October 26, 2012 - 10:00am - 10:50am
KEC 1005

Su-In Lee
Assistant Professor
Computer Science and Engineering and Genome Sciences
University of Washington

Speaker Biography: Su-In Lee is an Assistant Professor of Computer Science & Engineering and Genome Sciences at the University of Washington, Seattle. Her group is broadly interested in developing advanced machine learning algorithms to solve important problems in genetics and molecular biology. The goal of her current research projects can be summarized as: (1) building probabilistic models representing various levels of gene regulation; (2) inferring causal pathways from genetic and environmental influences to complex phenotypic traits such as diseases; (3) developing computational framework for personalized medicine.

She completed her PhD in Jan, 2009 under the supervision of Professor Daphne 
Koller at Stanford University. Su-In graduated Summa Cum Laude with a B.Sc. in 
Electrical Engineering and Computer Science from Korea Advanced Institute of 
Science and Technology.

Abstract:
Humans differ in many "phenotypes" such as weight, hair color and more 
importantly disease susceptibility. These phenotypes are largely determined by each 
individual's specific genotype, stored in the 3.2 billion bases of his or her genome 
sequence. Deciphering the sequence by finding which sequence variations cause a certain 
phenotype would have a great impact. The recent advent of high-throughput genotyping 
methods has enabled retrieval of an individual's sequence information on a genome-wide 
scale. Classical approaches have focused on identifying which sequence variations are 
associated with a particular phenotype. However, the complexity of cellular mechanisms, 
through which sequence variations cause a particular phenotype, makes it difficult to 
directly infer such causal relationships. In this talk, I will present statistical 
machine learning approaches that address these challenges by explicitly modeling the 
cellular mechanisms induced by sequence variations. For examp!
le, one of the approaches can take as input genome-wide expression measurements and aim to generate 
a finer-grained hypothesis such as "sequence variations S induces cellular processes M, which 
lead to changes in the phenotype P". Furthermore, we have developed a general machine learning 
technique, named "meta-prior algorithm", which can learn the regulatory potential of each 
sequence variation based on their intrinsic characteristics. This improvement helps to identify a 
true causal sequence variation among a number of sequence variations in the same chromosomal 
region. Our approaches have led to novel insights on sequence variations, and some of the 
hypotheses have been validated through biological experiments. Many of our machine learning 
techniques are generally applicable to a wide-ranging set of applications, and as an example I will 
present the meta-prior algorithm in the context of movie rating prediction tasks using the Netflix 
data set.
_______________________________________________
Colloquium mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/colloquium

Reply via email to