Re: current projects in progress and/or list of things to do

Christian Höner zu Siederdissen Thu, 04 Feb 2016 02:55:59 -0800

Hi Vasili,

>    Â  Â  Are most other biohaskell members interested in your suggested
>    functionality regarding the bowtie implementation?


I think Olaf is more interested in the sequence formats that are in use
in bowtie, than in bowtie itself. Having efficient (en|de)-coding of all
kinds of bio-formats is quite useful for many of us.

Particular algorithms (like bowtie) however, are probably of interest
only to 1-2 people, if they happen to work on such a problem at the
time.

However, such things should *not* deter you, it makes for a good
learning experience to build things up, especially if there is something
like a reference implementation to compare to.

Viele Gruesse,
Christian



* Vasili I. Galchin <vigalc...@gmail.com> [04.02.2016 06:12]:
>    Olaf,
>    Â  Â  Are most other biohaskell members interested in your suggested
>    functionality regarding the bowtie implementation?
> 
>    Â  Â  I looked a little at the bowtie c++ source. Mounds of code :-)
> 
>    Â Â  Ok ... we need to look for "invariants" (not exactly like in pure
>    maths .. but something like IMO) between different software architectures.
>    Sorry .. I know very hard to understand my thoughts. In software the
>    notion of "invariant" is very loose. I am thinking "out loud"., i.e.
>    .thinking as IÂ  am writing. OK .. the Johns Hopkins' writer made
>    architectural decisions that are linked toÂ  1) his language choice i.e.
>    the C++ language and also to his personal ideas in a language independent
>    way. I am thinking. We have to try to decouple 1) and 2) to derive the
>    invariant by throwing away pieces of 1) that are totally language
>    dependentÂ  .. hence have no bearing on what is the "target" language,
>    e.g. Haskell. E.g. the C++ implementation uses a "thread driver" to move
>    the segment processing forward over time. Is this part of 1) or 2) ( I am
>    assuming that 1) and 2) are disjoint)? Answer: I don't know. i am sorry
>    for all of the tortured thinking. Bottom line: need to understand globally
>    the C++ software architecture to understand what is language dependent(and
>    can be thrown away) and independent part that is I think language
>    invariant. then use the language invariant part to design and implement in
>    Haskell. Shields up: i anticipate flames in my lower parts .. :-(
> 
>    VasylÂ 
>    Â 
>    On Tuesday, February 2, 2016, Olaf Klinke <o...@aatal-apotheke.de> wrote:
> 
>      One is on GitHub, one is on sourceforge. Google
> 
>      bwa site:github.com
> 
>      or go to
> 
>      http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.6/
> 
>      Olaf
> 
>      > Am 02.02.2016 um 01:33 schrieb Vasili I. Galchin
>      <vigalc...@gmail.com>:
>      >
>      > No promises. I have done a lot of reverse engineering in C/C++. So
>      > where is the extant C/C++ source??
>      >
>      > On Mon, Feb 1, 2016 at 5:21 PM, Olaf Klinke <o...@aatal-apotheke.de>
>      wrote:
>      >> Yes, there is a TODO: Get rid of the "cheap watches" spam sent to
>      this list. In fact the signal/noise ratio makes we want to unsubscribe.
>      >>
>      >> Other than that, I'd like to see someone reverse-engineer one of the
>      major sequence index formats and provide a Haskell interface, so that we
>      can design our own functional alignment algorithms instead of building
>      shell scripts around bowtie or bwa.
>      >> By reverse-engineer I mean look at the source code. It's all there,
>      but poorly documented. I understand too little C/C++ to make sense of
>      how precisely these index structures are stored. But if one could write
>      a Data.Binary instance, that'd be awesome.
>      >> Meanwhile I implemented a Lempel-Ziv together with full-text search
>      on the compressed data (not my idea). This is possible if one uses one
>      trie for the entire text. However, full-text search only succeeds if the
>      match overlaps a block boundary. That should be fine for sufficiently
>      long queries.
>      >>
>      >> --Olaf

Re: current projects in progress and/or list of things to do

Reply via email to