On Fri, 3 Feb 2012 16:59:53 -0800, Randy Little <[email protected]> wrote: > Peter and Deke both said thats not the case anymore and that using the > hyper threads since westmer i7 came out would be just as fast. To which I > said what you just said. I just wanted to verify it. In fact even in comp > in the GUI 8 is faster then 16. So either Peter and Deke are wrong or Mac > threading isn't on par with the other platforms.
I can't see the email I sent in the mailing list (seems to have disappeared), so I'm not sure *what* I actually said, but I'll clear up what the situation *should* be in *most* cases: With Nahalem (and the Westmere process shrink), Intel re-introduced Hyperthreading to their processors and it was a lot more useful in terms of increasing processor throughput compared to the Pentium 4 days, when Hyperthreading was rarely useful (because in those days all the memory data went across the FSB and was pretty much bottlenecked, so the processor could never get the data fast enough for hyperthreading to be useful). This is why in Nuke we ignore the virtual cores by default when settings the number of threads, because in the P4 days, it wasn't great. It was a lot better on the Nahalems (and Westmere) due to the replacement of the FSB with Intel's Quick Path Interconnect which had a lot more bandwidth. However, with the introduction of the AVX instruction set, there was still the issue that for each processor cycle, the Nahalems and Westmeres could only load/store 1 float item (or 4 when using SSE). That meant they were still bottlenecked. With Sandy Bridge however, Intel doubled this to two float stores/loads per cycle (and 2 SSE floats, and 1 AVX float), which in theory should double the bandwidth available to the processor in the best of cases. So that means the Sandy Bridge processors should be quite a bit faster than Westmeres in tight float processing code. What you actually see however, is another thing - I've seen close to doubling performance when going from a Westmere processor to a fairly similar (clock speed anyway) Sandy Bridge one for other code I've written, but Nuke still is very often I/O bound, so it's very difficult to say - I've never actually done a comparison with Nuke. Also, I should mention that while we're talking about potential threading issues on Mac, the OS X scheduler (organising when threads get processor time) is pretty atrocious when I've profiled it for both Nuke stuff and other stuff - definitely Linux with a recent (2.6.34+) kernel does a much better job of keeping the processors fully-utilised when I've compared the same code doing similar workloads. So, TLDR: 1. The issue might be that Nuke is still IO bound 2. Trying it on a Sandy Bridge *should* give better results than Westmere 3. OS X's scheduler isn't great. Cheers, Peter _______________________________________________ Nuke-users mailing list [email protected], http://forums.thefoundry.co.uk/ http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-users
