[CC everybody including the biohaskell list. Let me know if any of you want off. :-) ]
Pjotr Prins <pjotr2...@thebird.nl> writes: > http://www.open-bio.org/wiki/Google_Summer_of_Code > For Biopython (3x), BioRuby (5x) and BioJava (4x) I found project ideas. > The others are missing. > There is still a (rather small) window of opportunity for adding > ideas. I have one thing that might work well as a SOC project, if the right student could be found. Basically, I and a colleague recently developed and published a method and implementation for more sensitive pairwise alignments. The paper is here, I think (PLoS ONE seems to be down atm): http://dx.plos.org/10.1371/journal.pone.0054422 I'm really happy about the results, if nothing else, check the SCOP benchmark. Although it's difficult to construct a good test case using more complex methods (training sets for HMMs and whatnot) I don't know anything that is as good as this. We're using it for annotation of genes. The current implementation is in Haskell, and although it works correctly, it is a bit slow, and more problematic, it consumes too much memory (so going multi-threaded, although pretty easy, won't be of any help). I would like to make this into a less resource intensive (and thus more practical) tool, and there are two ways I can think of to go about this: 1) Optimize the Haskell program 2) Reimplement the algorithm (or parts of it) in a different language Advantages of 1: * Already have a working program, and the type system makes it easy to refactor without introducing errors. * Haskell supports lots of good multi-threading programming models (like STM) * I know Haskell pretty well, and will be hopefully be able to mentor. Disadvantages: * Haskell has some good debugging tools, but they tend to work really poorly for large memory (i.e. it takes a long time to generate profiles) * Needs somebody with a bit (or a lot) of experience optimizing Haskell, and good knowledge of high-perf libraries (like vector) Advantages of 2: * Easier to get a student with adequate skills. * More predictable performance models in other languages. * Easier to compile and install for many users. Disadvantages: * Ideally, should know enough Haskell to read and understand the code. * Likely needs a co-mentor with knowledge of the language in question. Is this something I could or should submit as a task? -k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Biohaskell mailing list Biohaskell@biohaskell.org http://malde.org/cgi-bin/mailman/listinfo/biohaskell