On Fri, 3 Feb 2012 16:59:53 -0800, Randy Little <[email protected]>
wrote:
> Peter and Deke both said thats not the case anymore and that using the
> hyper threads since westmer i7 came out would be just as fast.  To which
I
> said what you just said.  I just wanted to verify it.  In fact even in
comp
> in the GUI 8 is faster then 16.   So either Peter and Deke are wrong or
Mac
> threading isn't on par with the other platforms.

I can't see the email I sent in the mailing list (seems to have
disappeared), so I'm not sure *what* I actually said, but I'll clear up
what the situation *should* be in *most* cases:

With Nahalem (and the Westmere process shrink), Intel re-introduced
Hyperthreading to their processors and it was a lot more useful in terms of
increasing processor throughput compared to the Pentium 4 days, when
Hyperthreading was rarely useful (because in those days all the memory data
went across the FSB and was pretty much bottlenecked, so the processor
could never get the data fast enough for hyperthreading to be useful). This
is why in Nuke we ignore the virtual cores by default
when settings the number of threads, because in the P4 days, it wasn't
great.

It was a lot better on the Nahalems (and Westmere) due to the replacement
of the FSB with Intel's Quick Path Interconnect which had a lot more
bandwidth. However, with the introduction of the AVX instruction set, there
was still the issue that for each processor cycle, the Nahalems and
Westmeres could only load/store 1 float item (or 4 when using SSE). That
meant they were still bottlenecked.

With Sandy Bridge however, Intel doubled this to two float stores/loads
per cycle (and 2 SSE floats, and 1 AVX float), which in theory should
double the bandwidth available to the processor in the best of cases. So
that means the Sandy Bridge processors should be quite a bit faster than
Westmeres in tight float processing code.

What you actually see however, is another thing - I've seen close to
doubling performance when going from a Westmere processor to a fairly
similar (clock speed anyway) Sandy Bridge one for other code I've written,
but Nuke still is very often I/O bound, so it's very difficult to say -
I've never actually done a comparison with Nuke.

Also, I should mention that while we're talking about potential threading
issues on Mac, the OS X scheduler (organising when threads get processor
time) is pretty atrocious when I've profiled it for both Nuke stuff and
other stuff - definitely Linux with a recent (2.6.34+) kernel does a much
better job of keeping the processors fully-utilised when I've compared the
same code doing similar workloads.

So, TLDR:

1. The issue might be that Nuke is still IO bound
2. Trying it on a Sandy Bridge *should* give better results than Westmere
3. OS X's scheduler isn't great.

Cheers,
Peter
_______________________________________________
Nuke-users mailing list
[email protected], http://forums.thefoundry.co.uk/
http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-users

Reply via email to