Hi everybody, Some memory profiling seems like a good idea, though I guess we can kill a big bunch of problems by switching over to the cool stuff (i.e. fusioned, unboxed) -- currently planned for the next two weeks.
Btw., I basically never have any memory or garbage collection problems when writing adpfusion code -- even for the linguistics stuff another student in Leipzig is working on where the input sequences are less than 10 (!) characters long most of the time. Regarding the data representation: I'd be a bit reluctant to have Sarah introduce anything more fancy than, say, two modules where the only exported 'align' function has the same interface. There are not that many choices of high-performance libraries: (i) my adpfusion library [1-core, high-level], (ii) repa [multi-core, low-level], (iii) accelerate [gpu ;-), looow-level]. Though it is probably a good idea to encapsulate in such a way that we [Sarah ;-)] can compare the current vs. any new implementation easily. Maybe Sarah can give more details on the typical sequence lengths, it is somewhere in the 100s of nucleotides, so rather harmless. Viele Gruesse, Christian * Ketil Malde <ke...@malde.org> [27.05.2014 09:32]: > > > Here is the post: http://biohaskell.org/GSoC_blog/Weeks_1and2 > > Great! Is this (i.e. the profiling numbers) from running the benchmark > I sent you? So you got it running, and it's generating acceptable > output? > > As you noticed, we spend way too much time garbage collecting. The > solution is not *necessarily* to use less space, but I suspect this is > due to the sparse representation using lists generating (and GC'ing) > many, many cons cells. By using a non-sparse representation (that is, a > matrix), space use might be larger, but more predictable/constant (but > make sure to use a mutable matrix). > > What you could do, is run with memory profiling (-h or -hd, IIRC) to see > exactly what data is generated. > > And - it is possible to alleviate this somewhat by tuning GC (RTS options > like -A, -M, and -H, IIRC) > > -k > > PS: the options are from memory, but can be looked up easily enough. > -- > If I haven't seen further, it is by standing in the footprints of giants * Johannes Waldmann <johannes.waldm...@htwk-leipzig.de> [27.05.2014 09:58]: > Dear all, > > > due to the sparse representation using lists generating (and GC'ing) > > many, many cons cells. > > uh, actual lists (Prelude.[])? > > > By using a non-sparse representation (that is, a > > matrix), space use might be larger, but more predictable/constant (but > > make sure to use a mutable matrix). > > (perhaps out of scope for this student project, but) > make sure to introduce an abstraction layer > so that it is possible to switch the underlying data representation > without changing the implementation of the algorithm? > > there are lots of ways to represent matrices (both sparse and full). > what size are we talking about, for this application? > > > What you could do, is run with memory profiling > > also, http://hackage.haskell.org/package/ekg for live profiling. > > Good luck with the project > > - J. >
pgp4fpuVXyJVX.pgp
Description: PGP signature