Thanks, Michael. You answered my question, and at just the right level
for me.
Not feeling that adventurous, I think I'll wait for Snow Leopard
rather than experiment with Linux!
By the way, Crafty has always been open source, and has been a great
help to many would-be chess programmers. If you'd like to take a look
at the code, you can get the latest version 23.0 at
ftp://ftp.cis.uab.edu/pub/hyatt/source/crafty-23.0.zip
Thanks again,
Louis
On Aug 7, 2009, at 12:55 AM, Michael Petch wrote:
Howdy Louis,
I think that MAX_NUMTHREADS was an artificial limit set by the
hardware of
the day. Christian can likely tell you why it is 16 specifically but
I am
assuming that it was a someone arbitrary(and reasonable) value based
on
cores available on most systems.
Onto your OS/X issue. I did a bit of research and my original view on
waiting for Snow Leopard may actually be all that is required.
Nehalem processors diverge from the previous generation of Intel
processors
because they no longer based on SMP (Symmetric MultiProcessor)
designs. In
an SMP system, generally all processors have access to main memory
(RAM) via
a single data bus. The problem of course is that the more cores you
have,
the more contention for memory read/writes that have to occur on
that one
bus.
Intel decided that SMP designs likely will not scale properly in the
future
when dealing with large core counts (32, 62, 128 cores etc) so they
moved
their Nehalem design to NUMA type systems instead of SMP. NUMA is non
uniform memory access. In this type of design cores may not
necessarily be
able to share memory with other processors without some help. I'm
nto going
to get into the gorey details but the bus system Intel is pushing is
the QPI
(QuickPath interconnect) bus. This literally replaces the good old FSB
(Front side Bus)
NUMA architectures do allow for the concept of "Remote" and "Local"
data.
Shared data may not be directly available by a processor but it can be
retrieved (remotely) but it will be slower. Operating System Kernels
need
NUMA support in order for shared data access on different buses to
work
properly.
So your asking, why tell me all this? Well the answer is simple.
Apple in
their infinite wisdom started using new QPI/Numa hardware without
actually
fully implementing NUMA in its current kernel! This hasn't been well
documented by Apple but it was discovered when companies started
running
Xserve on the new QPI/Nehalem systems.
Without proper NUMA support, processors can't arbitrarily share
memory with
all other processors. Which seems to be the case here with GnuBG.
Gnubg
launches in a single process and then asks the OS/X to create
threads (with
shared memory requirements). It appears by default that each
processor is
considered as a separate entity without sharing (On OS/X Leopard). The
exception is that eacg core appears as 2 virtual cores. Virtual
cores are on
the same processor, thus the same bus so one can share memory across
them.
It seems when Gnubg launches, all the threads are created on one
processor
(the processor is originally chosen by OS/X) and accessible by 2
virtual
cores (Using Hyperthreading). It seems Apple did this so they could
put out
new equipment before the next OS (Snow Leopard) was released.
So what does Snow Leopard have that Leapard doesn't? NUMA support.
My guess is that if you got your hands on Snow Leopard you may find
that
what you are seeing changes. Apparently this very problem exists for
people
using CS4 (Adobes Creative Studio 4).
Linux supports NUMA, you might be adventuresome and try to install
Linux on
your Apple Hardware and see what happens.
Your chess program may work because of the way it splits up tasks
(It may
even use a combination of Posix Threads and separate process
spaces). I
haven't seen the source code so its very hard to say.
Michael Petch
_______________________________________________
Bug-gnubg mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-gnubg