The main takeaway I had from my work with prefetching was that if you can shove things into a fixed-sized queue and prefetch on the way into the queue instead of doing it just to sort of kickstart the next element during a tree traversal that is going to be demanded too fast to take full advantage of the latency, then you can smooth out a lot of the cross system variance.
It is just incredibly invasive. =( Re: doing prefetching in the mark phase, I just skimmed and found http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9090&rep=rep1&type=pdf takes which appears to take a similar approach. -Edward On Fri, Nov 28, 2014 at 3:42 AM, Simon Marlow <[email protected]> wrote: > Thanks for this. In the copying GC I was using prefetching during the > scan phase, where you do have a pretty good tunable knob for how far ahead > you want to prefetch. The only variable is the size of the objects being > copied, but most tend to be in the 2-4 words range. I did manage to get > 10-15% speedups with optimal tuning, but it was a slowdown on a different > machine or with wrong tuning, which is why GHC doesn't have any of this > right now. > > Glad to hear this can actually be used to get real speedups in Haskell, I > will be less sceptical from now on :) > > Cheers, > Simon > > On 27/11/2014 10:20, Edward Kmett wrote: > >> My general experience with prefetching is that it is almost never a win >> when done just on trees, as in the usual mark-sweep or copy-collection >> garbage collector walk. Why? Because the time from the time you prefetch >> to the time you use the data is too variable. Stack disciplines and >> prefetch don't mix nicely. >> >> If you want to see a win out of it you have to free up some of the >> ordering of your walk, and tweak your whole application to support it. >> e.g. if you want to use prefetching in garbage collection, the way to do >> it is to switch from a strict stack discipline to using a small >> fixed-sized queue on the output of the stack, then feed prefetch on the >> way into the queue rather than as you walk the stack. That paid out for >> me as a 10-15% speedup last time I used it after factoring in the >> overhead of the extra queue. Not too bad for a weekend project. =) >> >> Without that sort of known lead-in time, it works out that prefetching >> is usually a net loss or vanishes into the noise. >> >> As for the array ops, davean has a couple of cases w/ those for which >> the prefetching operations are a 20-25% speedup, which is what motivated >> Carter to start playing around with these again. I don't know off hand >> how easily those can be turned into public test cases though. >> >> -Edward >> >> On Thu, Nov 27, 2014 at 4:36 AM, Simon Marlow <[email protected] >> <mailto:[email protected]>> wrote: >> >> I haven't been watching this, but I have one question: does >> prefetching actually *work*? Do you have benchmarks (or better >> still, actual library/application code) that show some improvement? >> I admit to being slightly sceptical - when I've tried using >> prefetching in the GC it has always been a struggle to get something >> that shows an improvement, and even when I get things tuned on one >> machine it typically makes things slower on a different processor. >> And that's in the GC, doing it at the Haskell level should be even >> harder. >> >> Cheers, >> Simon >> >> >> On 22/11/2014 05:43, Carter Schonwald wrote: >> >> Hey Everyone, >> in >> https://ghc.haskell.org/trac/__ghc/ticket/9353 >> <https://ghc.haskell.org/trac/ghc/ticket/9353> >> and >> https://phabricator.haskell.__org/D350 >> <https://phabricator.haskell.org/D350> >> >> is some preliminary work to fix up how the pure versions of the >> prefetch >> primops work is laid out and prototyped. >> >> However, while it nominally fixes up some of the problems with >> how the >> current pure prefetch apis are fundamentally borken, the simple >> design >> in D350 isn't quite ideal, and i sketch out some other ideas in >> the >> associated ticket #9353 >> >> I'd like to make sure pure prefetch in 7.10 is slightly less >> broken >> than in 7.8, but either way, its pretty clear that working out >> the right >> fixed up design wont happen till 7.12. Ie, whatever makes 7.10, >> there >> WILL have to be breaking changes to fix those primops for 7.12 >> >> thanks and any feedback / thoughts appreciated >> -Carter >> >> >> _________________________________________________ >> ghc-devs mailing list >> [email protected] <mailto:[email protected]> >> http://www.haskell.org/__mailman/listinfo/ghc-devs >> <http://www.haskell.org/mailman/listinfo/ghc-devs> >> >> _________________________________________________ >> ghc-devs mailing list >> [email protected] <mailto:[email protected]> >> http://www.haskell.org/__mailman/listinfo/ghc-devs >> <http://www.haskell.org/mailman/listinfo/ghc-devs> >> >> >>
_______________________________________________ ghc-devs mailing list [email protected] http://www.haskell.org/mailman/listinfo/ghc-devs
