> I claim that one could do a lot more resource-friendly and flexible
> alignments by tweaking the core alignment algorithm directly. In
> Haskell this can be done with relative ease (see below), but we lack a
> fast and compact full-text index. Instead of re-writing one, we should
> first try to use the existing, optimized, index structures. These come
> with efficient creation programs, so that we may focus on the
> alignment rather than the indexing.  

I agree!  I've toyed with a more generic approach to alignment
algorithms (variations over Smith-Waterman, rather than short read
mapping, but anyway), and I think we could build the infrastructure
(i.e. a nice DSL) for constructing tailored alignment algorithms in
Haskell.

> Hence reverse-engineering the FM-Index or Suffix Array

..file formats, not algorithm, right?

Once upon a time, Ferragina and Navarro did a comparison of the
different compressed indices (http://pizzachili.dcc.uchile.cl/), I guess
bowtie and bwa "won" this in the end, but perhaps this is interesting
because they defined a standard interface to these structures (click on
the API tab).

> may do the
> biohaskellers working with sequencing data a good deed. Apart from the
> raw representation, my student and I thought about a typeclass on top
> of which to formulate new alignment algorithms. We propose a class
> that essentially has the suffix trie as its free algebra:  

I have to admit I don't follow this argument entirely, although I can
write basic programs in Haskell, the last sentence is slightly beyond my
grasp.  If you want me to understand it, I need a bit more detailed
explanation, with examples etc.

But, yes, being able to read a compressed index and query it would be a
nice addition.

 - - -

To return to the original question here.  Couldn't we just add a
TODO-page (or category) to the Wiki?  I sometimes get people contacting
me about previous (and for me now less interesting) GSoC proposals, so
having a page listing good starting points for people looking to get
into Haskell and bioinformatics might be very useful.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

Reply via email to