Olaf, Are most other biohaskell members interested in your suggested functionality regarding the bowtie implementation?
I looked a little at the bowtie c++ source. Mounds of code :-) Ok ... we need to look for "invariants" (not exactly like in pure maths .. but something like IMO) between different software architectures. Sorry .. I know very hard to understand my thoughts. In software the notion of "invariant" is very loose. I am thinking "out loud"., i.e. .thinking as I am writing. OK .. the Johns Hopkins' writer made architectural decisions that are linked to 1) his language choice i.e. the C++ language and also to his personal ideas in a language independent way. I am thinking. We have to try to decouple 1) and 2) to derive the invariant by throwing away pieces of 1) that are totally language dependent .. hence have no bearing on what is the "target" language, e.g. Haskell. E.g. the C++ implementation uses a "thread driver" to move the segment processing forward over time. Is this part of 1) or 2) ( I am assuming that 1) and 2) are disjoint)? Answer: I don't know. i am sorry for all of the tortured thinking. Bottom line: need to understand globally the C++ software architecture to understand what is language dependent(and can be thrown away) and independent part that is I think language invariant. then use the language invariant part to design and implement in Haskell. Shields up: i anticipate flames in my lower parts .. :-( Vasyl On Tuesday, February 2, 2016, Olaf Klinke <o...@aatal-apotheke.de> wrote: > One is on GitHub, one is on sourceforge. Google > > bwa site:github.com > > or go to > > http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.6/ > > Olaf > > > Am 02.02.2016 um 01:33 schrieb Vasili I. Galchin <vigalc...@gmail.com>: > > > > No promises. I have done a lot of reverse engineering in C/C++. So > > where is the extant C/C++ source?? > > > > On Mon, Feb 1, 2016 at 5:21 PM, Olaf Klinke <o...@aatal-apotheke.de> > wrote: > >> Yes, there is a TODO: Get rid of the "cheap watches" spam sent to this > list. In fact the signal/noise ratio makes we want to unsubscribe. > >> > >> Other than that, I'd like to see someone reverse-engineer one of the > major sequence index formats and provide a Haskell interface, so that we > can design our own functional alignment algorithms instead of building > shell scripts around bowtie or bwa. > >> By reverse-engineer I mean look at the source code. It's all there, but > poorly documented. I understand too little C/C++ to make sense of how > precisely these index structures are stored. But if one could write a > Data.Binary instance, that'd be awesome. > >> Meanwhile I implemented a Lempel-Ziv together with full-text search on > the compressed data (not my idea). This is possible if one uses one trie > for the entire text. However, full-text search only succeeds if the match > overlaps a block boundary. That should be fine for sufficiently long > queries. > >> > >> --Olaf > >