On Jul 7, 6:23 am, Jon Harrop <j...@ffconsultancy.com> wrote: > On Tuesday 07 July 2009 02:08:57 Bradbev wrote: > > > On Jul 6, 4:30 pm, fft1976 <fft1...@gmail.com> wrote: > > > On Jul 5, 11:42 pm, Bradbev <brad.beveri...@gmail.com> wrote: > > > > more to modern x86 chips. After you have the best algorithm for the > > > > job, you very quickly find that going fast is entirely bound by memory > > > > speed (actually latency) - cache misses are the enemy. > > > > IME (outside JVM), this depends strongly on the kind of problem you > > > are solving as well as your implementation (you need to know how to > > > cache-optimize). One can easily think of problems that would fit > > > entirely in cache, but take an enormous amount of time. > > > What sort of problems did you have in mind? Anything that I can think > > of quickly spills the cache. > > There are many examples in scientific computing where many small problems are > attacked instead of one large problem. For example, practical use of FFTs > fall into this category with most users performing many transforms with no > more than 1,024 elements rather than fewer longer transforms. Time frequency > analysis usually takes n samples and produces an nxn grid over time and > frequency representing the signal where each frequency is computed from a > separate FFT. So you can get away with naive distribution of FFTs across > cores with no regard for cache coherence and still get very good performance. > Interesting, I've never dealt with problems in this domain - most of my performance problems involve relatively simple transforms over streams of data. It would be quite a different mindset to program for problems that fit entirely in cache, it would be fun to try & squeeze the theoretical power out of a chip. I guess at those levels you're mostly concerned about preventing pipeline stalls & instruction conflicts. I think I'd pretty much go for handwritten assembler at those levels.
> I would actually say that intercore cache effects are more important than > conventional cache coherence today because you get massive performance > degradation if you cock it up and it is not at all obvious when that might > occur because it depends upon things like where the allocator places your > data. For example, if you have cores mutating global counters then you must > make sure they are spaced out enough in memory that none share cache lines. Effects like that & cache line aliasing are difficult to diagnose without good tools. Not to mention that in most machines you have an OS scheduling other threads & polluting your cache. Brad --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---