Past meeting: July Houston Hadoop Meetup - Genomic data analysis with hadoop

Mark Kerzner Tue, 17 Jul 2012 14:30:24 -0700

Hi, all,

that's what it was about


July Houston Hadoop Meetup - Genomic data analysis with
hadoop<http://shmsoft.blogspot.com/2012/07/july-houston-hadoop-meetup-genomic-data.html>

<http://2.bp.blogspot.com/-LQOZ0kppE7Y/UATvSSC-CyI/AAAAAAAAKT0/3cVl_S83Tkg/s1600/Genome.png>Dianhui
(Dennis) Zhu  presented "Genomic data analysis with hadoop".  He talked
about using Hadoop framework to do pattern search in genomic sequence
datasets. This is based on his three-year project at Baylor, which started
using Hadoop a year ago. Dennis is Senior Scientific Programmer at HGSC.

Dianhui told us about the following issues

1. Setup a Hadoop test cluster with 4 nodes.
2. Code walk through and unit testing with Mokito and MRUnit
3. Live demo: running our Hadoop application on the  4-node cluster.

The interesting technical problem that Dennis showed was to break sequence
into chunks, before it gets to the Mapper - which is usually trivial in the
regular applications, but is quite hard with unlimited unstructured data of
the genome. The audience analyzed the actual code, asked many questions,
and wanted to compare to the existing open source projects.

Indeed, that is an article on the Cloudera blog,
http://www.cloudera.com/blog/2009/10/analyzing-human-genomes-with-hadoop/,
and it refers to the Crossbow open source project,
http://bowtie-bio.sourceforge.net/crossbow/index.shtml. It will interested
to see how that compares to Dennis's work.

Past meeting: July Houston Hadoop Meetup - Genomic data analysis with hadoop

Reply via email to