> I claim that one could do a lot more resource-friendly and flexible > alignments by tweaking the core alignment algorithm directly. In > Haskell this can be done with relative ease (see below), but we lack a > fast and compact full-text index. Instead of re-writing one, we should > first try to use the existing, optimized, index structures. These come > with efficient creation programs, so that we may focus on the > alignment rather than the indexing.
I agree! I've toyed with a more generic approach to alignment algorithms (variations over Smith-Waterman, rather than short read mapping, but anyway), and I think we could build the infrastructure (i.e. a nice DSL) for constructing tailored alignment algorithms in Haskell. > Hence reverse-engineering the FM-Index or Suffix Array ..file formats, not algorithm, right? Once upon a time, Ferragina and Navarro did a comparison of the different compressed indices (http://pizzachili.dcc.uchile.cl/), I guess bowtie and bwa "won" this in the end, but perhaps this is interesting because they defined a standard interface to these structures (click on the API tab). > may do the > biohaskellers working with sequencing data a good deed. Apart from the > raw representation, my student and I thought about a typeclass on top > of which to formulate new alignment algorithms. We propose a class > that essentially has the suffix trie as its free algebra: I have to admit I don't follow this argument entirely, although I can write basic programs in Haskell, the last sentence is slightly beyond my grasp. If you want me to understand it, I need a bit more detailed explanation, with examples etc. But, yes, being able to read a compressed index and query it would be a nice addition. - - - To return to the original question here. Couldn't we just add a TODO-page (or category) to the Wiki? I sometimes get people contacting me about previous (and for me now less interesting) GSoC proposals, so having a page listing good starting points for people looking to get into Haskell and bioinformatics might be very useful. -k -- If I haven't seen further, it is by standing in the footprints of giants