CloudBurst: Hadoop for DNA Sequence Analysis

Michael Schatz Wed, 08 Apr 2009 21:19:35 -0700

Hadoop Users,

I just wanted to announce my Hadoop application 'CloudBurst' is available
open source at:
http://cloudburst-bio.sourceforge.net


In a nutshell, it is an application for mapping millions of short DNA
sequences to a reference genome to, for example, map out differences in one
individual's genome compared to the reference genome. As you might imagine,
this is a very data intense problem, but Hadoop enables the application to
scale up linearly to large clusters.

A full description of the program is available in the journal
Bioinformatics:
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp236

I also wanted to take this opportunity to thank everyone on this mailing
list. The discussions posted were essential for navigating the ins and outs
of hadoop during the development of CloudBurst.

Thanks everyone!

Michael Schatz

http://www.cbcb.umd.edu/~mschatz

CloudBurst: Hadoop for DNA Sequence Analysis

Reply via email to