Re: Some initial results with DPH

Roman Leshchinskiy Mon, 22 Sep 2008 22:00:06 -0700

Hi Austin,

first of all, thanks a lot for taking the time to report your results!


On 23/09/2008, at 11:48, Austin Seipp wrote:

* The vectorise pass boosts compilation times *a lot*. I don't think
 this is exactly unwarrented since it seems like a pretty complicated
 transformation, but while making the primitive version using just
 the unlifted interface the compilation takes about 1.5 seconds, for
 the vectorised version it's on the order of 15 seconds. For
 something as trivial as this dot-product thing, that's a bit
 of a compilation time, though.

The problem here is not the vectoriser but rather the subsequentoptimisations. The vectoriser itself is (or should be - I haven'treally timed it, to be honest) quite fast. It generates very complexcode, however, which GHC takes a lot of time to optimise. We'llimprove the output of the vectoriser eventually, but not before it iscomplete. For the moment, there is no solution for this, I'm afraid.

* It's pretty much impossible to use ghc-core to examine the output
 core of the vectorised version - I let it run and before anything
 started showing up in `less` it was already using on the order of
 100mb of memory. If I just add -ddump-simpl to the command line, the
 reason is obvious: the core generated is absolutely huge.


Yes. Again, this is something we'll try to improve eventually.

* For the benchmark included, the vectorised ver. spends about 98% of
 its time from what I can see in the GC before it dies from stack
 overflow. I haven't tried something like +RTS -A1G -RTS yet, though.


IIUC, the code is

dotp :: [:Int:] -> [:Int:] -> Int
dotp v w = I.sumP [: (I.*) x y | x <- v, y <- w :]

The way the vectoriser works at the moment, it will repeat the array w(lengthP v) times, i.e., create an array of length (lengthP v *lengthP w). This is quite unfortunate and needs to be fused away butisn't at the moment. The only advice I can give is to stay away fromarray comprehensions for now. They work but are extremely slow. Thisdefinition should work fine:


dotp v w = I.sumP (zipWithP (I.*) v w)

* The vectoriser is really, really touchy. For example, the below code
 sample works (from DotPVect.hs):

import Data.Array.Parallel.Prelude.Int as I

dotp :: [:Int:] -> [:Int:] -> Int
dotp v w = I.sumP [: (I.*) x y | x <- v, y <- w :]


This however, does not work:

dotp :: [:Int:] -> [:Int:] -> Int
dotp v w = I.sumP [: (Prelude.*) x y | x <- v, y <- w :]

This is because the vectorised code needs to call the vectorisedversion of (*). Internally, the vectoriser has a hardwired mappingfrom top-level functions to their vectorised versions. That is, itknows that it should replace calls to(Data.Array.Parallel.Prelude.Int.*) by calls toData.Array.Parallel.Prelude.Base.Int.plusV. There is no vectorisedversion of (Prelude.*), however, and there won't be one until we canvectorise the Prelude. In fact, the vectoriser doesn't even supportclasses at the moment. So the rule of thumb is: unless it's inData.Array.Parallel.Prelude or you wrote and vectorised it yourself,it will choke the vectoriser.

I also ran into a few other errs relating to the vectoriser dying - if
I can find some I'll reply to this with some results.

Please do! And please keep using DPH and reporting your results, thatis really useful to us!

FWIW, we'll include some DPH documentation in 6.10 but it still has tobe written...


Roman


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: Some initial results with DPH

Reply via email to