Some fun follow-up comments: Indeed, lists are quite ok, especially if new cons cells come directly out of the nursery -- I'm assuming that full fusion won't kick in due to the complex stuff going on.
I've seen one of the memory plots from Sarah, we'll skype a bit about it; but in general, there is a need for strictification and moving towards stream fusion & vectors; but: http://ro-che.info/ccc/02 Of course, may we have some magic please: ;-) https://github.com/choener/DnaProteinAlignment [it's on hackage too, but I've just noticed a fun bug] Anyway, we ran this algorithm on protein / mitogenome alignments quite successfully. Of course, it'll still require an awesome amount of memory [Again, more fun during the call ;-)]. === And indeed, we should definitely keep the original version around; in principle we can even quickcheck everything (old == new). Viele Gruesse, Christian PS: I actually prefer to have a lot of this stuff here on biohaskell, that way we keep these discussions public. * Ketil Malde <ke...@malde.org> [27.05.2014 13:58]: > > Johannes Waldmann <johannes.waldm...@htwk-leipzig.de> writes: > > > uh, actual lists (Prelude.[])? > > *blush* > > Seriously, they aren't all bad, the lists are produced and consumed > lazily, so it's more like a loop. Or it was intended to be, but clearly > there are still some problems. > > > there are lots of ways to represent matrices (both sparse and full). > > what size are we talking about, for this application? > > The matrix needs to be query sequence (typically nucleotide) times final > target sequence (typically protein). Apparently titin holds the record > of some tens of thousands of amino acids with the transcript thrice that > - so let's call it 100K x 35K or a total of 3.5G cells. Ouch...maybe > this isn't such a good idea after all. > > -k > -- > If I haven't seen further, it is by standing in the footprints of giants * Johannes Waldmann <johannes.waldm...@htwk-leipzig.de> [27.05.2014 13:56]: > Hi, > > > > There are not that many choices of high-performance libraries: > > (this is just for my understanding) > > in principle, the choice between different representations > can be expressed by associated types? > (or whatever it's called these days) > > > > (i) my adpfusion library [1-core, high-level], (ii) ... > > don't forget (0) - straightforward implementation (Prelude.[]) > > should be used to (a) understand the algorithmic idea > (b) for automated tests (to compare results with (i) ..) > > > - J. > >
pgpoRuKYJgJRc.pgp
Description: PGP signature