Hi Vasili, > Â Â Are most other biohaskell members interested in your suggested > functionality regarding the bowtie implementation?
I think Olaf is more interested in the sequence formats that are in use in bowtie, than in bowtie itself. Having efficient (en|de)-coding of all kinds of bio-formats is quite useful for many of us. Particular algorithms (like bowtie) however, are probably of interest only to 1-2 people, if they happen to work on such a problem at the time. However, such things should *not* deter you, it makes for a good learning experience to build things up, especially if there is something like a reference implementation to compare to. Viele Gruesse, Christian * Vasili I. Galchin <vigalc...@gmail.com> [04.02.2016 06:12]: > Olaf, >   Are most other biohaskell members interested in your suggested > functionality regarding the bowtie implementation? > >   I looked a little at the bowtie c++ source. Mounds of code :-) > >   Ok ... we need to look for "invariants" (not exactly like in pure > maths .. but something like IMO) between different software architectures. > Sorry .. I know very hard to understand my thoughts. In software the > notion of "invariant" is very loose. I am thinking "out loud"., i.e. > .thinking as I am writing. OK .. the Johns Hopkins' writer made > architectural decisions that are linked to 1) his language choice i.e. > the C++ language and also to his personal ideas in a language independent > way. I am thinking. We have to try to decouple 1) and 2) to derive the > invariant by throwing away pieces of 1) that are totally language > dependent .. hence have no bearing on what is the "target" language, > e.g. Haskell. E.g. the C++ implementation uses a "thread driver" to move > the segment processing forward over time. Is this part of 1) or 2) ( I am > assuming that 1) and 2) are disjoint)? Answer: I don't know. i am sorry > for all of the tortured thinking. Bottom line: need to understand globally > the C++ software architecture to understand what is language dependent(and > can be thrown away) and independent part that is I think language > invariant. then use the language invariant part to design and implement in > Haskell. Shields up: i anticipate flames in my lower parts .. :-( > > Vasyl >  > On Tuesday, February 2, 2016, Olaf Klinke <o...@aatal-apotheke.de> wrote: > > One is on GitHub, one is on sourceforge. Google > > bwa site:github.com > > or go to > > http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.6/ > > Olaf > > > Am 02.02.2016 um 01:33 schrieb Vasili I. Galchin > <vigalc...@gmail.com>: > > > > No promises. I have done a lot of reverse engineering in C/C++. So > > where is the extant C/C++ source?? > > > > On Mon, Feb 1, 2016 at 5:21 PM, Olaf Klinke <o...@aatal-apotheke.de> > wrote: > >> Yes, there is a TODO: Get rid of the "cheap watches" spam sent to > this list. In fact the signal/noise ratio makes we want to unsubscribe. > >> > >> Other than that, I'd like to see someone reverse-engineer one of the > major sequence index formats and provide a Haskell interface, so that we > can design our own functional alignment algorithms instead of building > shell scripts around bowtie or bwa. > >> By reverse-engineer I mean look at the source code. It's all there, > but poorly documented. I understand too little C/C++ to make sense of > how precisely these index structures are stored. But if one could write > a Data.Binary instance, that'd be awesome. > >> Meanwhile I implemented a Lempel-Ziv together with full-text search > on the compressed data (not my idea). This is possible if one uses one > trie for the entire text. However, full-text search only succeeds if the > match overlaps a block boundary. That should be fine for sufficiently > long queries. > >> > >> --Olaf