Re: abysmal multicore performance, especially on AMD processors

Lee Spector Fri, 07 Dec 2012 18:03:30 -0800

Thanks Andy.

My applications definitely allocate a lot of memory, which is reflected in all 
of that consing in the test I was using. It'd be hard to do what we do in any 
other way. I can see how a test using a Java mutable array would help to 
diagnose the problem, but if that IS the problem then it sounds like there'd be 
no solution short of radically re-engineering our systems. That would be sad!


My colleague who's running the tests is out for the weekend but we'll talk 
early next week and get back to you if we want to try your C programs etc.

 -Lee


On Dec 7, 2012, at 8:41 PM, Andy Fingerhut wrote:

> Lee:
> 
> I'll just give a brief description right now, but one thing I've found in the 
> past on a 2-core machine that was achieving much less than 2x speedup was 
> memory bandwidth being the limiting factor.
> 
> Not all Clojure code allocates memory, but a lot does.  If the hardware in a 
> system can write at rate X from a multicore processor to main memory, and a 
> single-threaded Clojure program writes to memory at rate 0.5*X, then the most 
> speedup you will ever get out of multicore execution of the same code on N 
> cores will be 2x, no matter how large N is.
> 
> As one way to see if this is the problem, you could try changing your "burn" 
> function so that instead of doing cons to build up a list result, first 
> allocate a Java mutable array before the loop that is as large as you need it 
> to be at the end, and write values into that.  You can convert it to some 
> other Clojure type at the end of the loop if you prefer.
> 
> 
> I have some C benchmark programs that test memory read and write bandwidth on 
> single and multiple cores you can run on your Intel machine to see if that 
> might be the issue.  If this is the issue, I would expect to see at least a 
> little speedup from 1 core to multiple cores, but capped at some maximum 
> speedup that is determined by the memory bandwidth, not the number of cores 
> you run in parallel.
> 
> I don't currently have any guess about what might be happening with the AMD 
> multicore machine.  If you are interested in wild guessing, perhaps there 
> could be some kind of multicore cache coherency protocol that is badly 
> configured, causing cache lines to be frequently invalidated when multiple 
> cores are sharing memory?  That would make more sense if multiple cores were 
> reading from and writing to the same cache lines, which doesn't seem terribly 
> likely for a typical Clojure program.
> 
> Let me know if you are interested and I will find those C programs for you to 
> try out.  I got them from somewhere on the Internet and may have tweaked them 
> a little bit.
> 
> Andy
> 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: abysmal multicore performance, especially on AMD processors

Reply via email to