Hi Bulat,

obviously, arrays version should create no temporary cells.

that's why the memory traffic surprised me. i knew there had to be something 
wrong.

the problems was mainly due to 2 factors:
1) readArray m (i,j)

yes, indeed. since we are dealing in bulk operations, we might as well take advantage of that, so dropping the repeated bounds-checks inside the loops makes a lot of sense.

moral/note to self: bulk array operations are your friend (i knew that!-), but you need to use that when defining them (unsafeRead et al are only for library writers, but library writers ought to use them; and i was writing a small inlined library)

2) 'op' in 'l' which was passed as real closure and was not inlined
due to weakness of ghc optimizer

it irked me having to write the same loop twice, but i didn't consider the 
consequences.
an INLINE pragma on l almost seems sufficient to regain the loss, so i prefer that; but writing out the loop twice is still a tiny tick faster..

we should help strictness analyzer by marking all the variables used in tight loops as strict.

ah, there were a few surprises in that one. i thought i had considered possible 
strictness
problems, but obviously, i missed a few relevant possibilities. annotating 
everything is the
safe option, and clearly documents the intentions, but i cannot help thinking about which annotations could be omitted:

- modArray: a and i are both used anyway
- i index in loop is definitely checked (but j isn't, and some others weren't, either; so better safe than sorry)
- some of the parameters need not be annotated in this example, but should be 
if one
wanted to reuse the code elsewhere
- the one i really don't get yet is the one on the accumulators (!s in l, in 
dotA/matA);
 i thought i had covered that by using $! in applying the loop, but annotating 
the
 formal loop parameter, apart from being nicer, also seems faster..

moral/note to self: don't try to be clever, try to be clear..; strictness in formal parameters is better than strictness in actual parameters; bang-patterns are good;-)

after that is done, we got 1000 times less temporary data allocated and 5x faster execution. now it's a bit faster than strict lists

is this now anywhere near what the number-crunchers use, when they use 
Haskell?-)

Bulat, thanks for looking into it and for isolating the issues so quickly!-)
claus

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to