Hi Michael, thx for investigating this. My answer would have been along the lines of "beyond our control". Jon coded most of the threading stuff and I believe that the MAX_NUMTHREADS is indeed somewhat arbitrary. However I believe there is a bit of memory consumption and possible also extra cpu time involved in setting it higher. Hopefully Jon will pitch in.
Christian. On Fri, Aug 7, 2009 at 6:55 AM, Michael Petch<[email protected]> wrote: > Howdy Louis, > > I think that MAX_NUMTHREADS was an artificial limit set by the hardware of > the day. Christian can likely tell you why it is 16 specifically but I am > assuming that it was a someone arbitrary(and reasonable) value based on > cores available on most systems. > > Onto your OS/X issue. I did a bit of research and my original view on > waiting for Snow Leopard may actually be all that is required. > > Nehalem processors diverge from the previous generation of Intel processors > because they no longer based on SMP (Symmetric MultiProcessor) designs. In > an SMP system, generally all processors have access to main memory (RAM) via > a single data bus. The problem of course is that the more cores you have, > the more contention for memory read/writes that have to occur on that one > bus. > > Intel decided that SMP designs likely will not scale properly in the future > when dealing with large core counts (32, 62, 128 cores etc) so they moved > their Nehalem design to NUMA type systems instead of SMP. NUMA is non > uniform memory access. In this type of design cores may not necessarily be > able to share memory with other processors without some help. I'm nto going > to get into the gorey details but the bus system Intel is pushing is the QPI > (QuickPath interconnect) bus. This literally replaces the good old FSB > (Front side Bus) > > NUMA architectures do allow for the concept of "Remote" and "Local" data. > Shared data may not be directly available by a processor but it can be > retrieved (remotely) but it will be slower. Operating System Kernels need > NUMA support in order for shared data access on different buses to work > properly. > > So your asking, why tell me all this? Well the answer is simple. Apple in > their infinite wisdom started using new QPI/Numa hardware without actually > fully implementing NUMA in its current kernel! This hasn't been well > documented by Apple but it was discovered when companies started running > Xserve on the new QPI/Nehalem systems. > > Without proper NUMA support, processors can't arbitrarily share memory with > all other processors. Which seems to be the case here with GnuBG. Gnubg > launches in a single process and then asks the OS/X to create threads (with > shared memory requirements). It appears by default that each processor is > considered as a separate entity without sharing (On OS/X Leopard). The > exception is that eacg core appears as 2 virtual cores. Virtual cores are on > the same processor, thus the same bus so one can share memory across them. > > It seems when Gnubg launches, all the threads are created on one processor > (the processor is originally chosen by OS/X) and accessible by 2 virtual > cores (Using Hyperthreading). It seems Apple did this so they could put out > new equipment before the next OS (Snow Leopard) was released. > > So what does Snow Leopard have that Leapard doesn't? NUMA support. > > My guess is that if you got your hands on Snow Leopard you may find that > what you are seeing changes. Apparently this very problem exists for people > using CS4 (Adobes Creative Studio 4). > > Linux supports NUMA, you might be adventuresome and try to install Linux on > your Apple Hardware and see what happens. > > Your chess program may work because of the way it splits up tasks (It may > even use a combination of Posix Threads and separate process spaces). I > haven't seen the source code so its very hard to say. > > Michael Petch > > On 06/08/09 10:29 AM, "Louis Zulli" <[email protected]> wrote: > >> Hi, >> >> I put >> >> #define MAX_NUMTHREADS 64 >> >> in multithread.h and rebuilt. >> >> In Settings-->Options-->Other, I put Eval Threads to 64. >> >> I then let gnubg analyze a game using 4-ply analysis. >> >> According to my unix top command, gnubg had 69 threads and was using >> 188%CPU. So apparently all the threads were running (into each other!) >> in one physical core. >> >> In any case, increasing the max number of threads above 16 seems >> trivial to do, unless I'm missing something. >> >> Louis >> >> >> On Aug 6, 2009, at 11:34 AM, Ingo Macherius wrote: >> >>> Do you use the calibrate command or a batch analysis of matchfiles? >>> The >>> former was shown to be of no value for benchmarks, see here: >>> http://lists.gnu.org/archive/html/bug-gnubg/2009-08/msg00006.html >>> >>> With calibrate I had the very same effect of high idle times during >>> benchmarks, unless I used at least 8 threads per physical core. >>> >>> I am doing benchmark on a 4 core machine which iterates over #thread >>> (1..6) >>> and cache size (2^1 .. 2^27). Should be posted in say 3 hours, it >>> literally >>> is still running :) >>> >>> Ingo >>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:[email protected]] On >>>> Behalf Of Louis Zulli >>>> Sent: Thursday, August 06, 2009 3:21 PM >>>> To: Michael Petch >>>> Cc: [email protected] >>>> Subject: [Bug-gnubg] Re: Getting gnubg to use all available cores >>>> >>>> >>>> >>>> On Aug 5, 2009, at 4:02 PM, Michael Petch wrote: >>>> >>>>> I'm unsure how the architecture is deployed and how OS/X >>>> handles the >>>>> physical cores, but it almost sounds like one Physical core is being >>>>> used >>>>> (Using Hyperthreads to run 2 threads simultaneously). I wonder if >>>>> the memory >>>>> is shared across all the cores? A friend of mine was >>>> suggesting that >>>>> people >>>>> may have to wait for Snow Lapard to come out before OS/X properly >>>>> utilizes >>>>> the Nehalem architecture (whetehr that si true or not, I >>>> don't know). >>>>> >>>>> Anyway, as an experiment. If you run 2 copies of Gnubg at the same >>>>> time >>>>> (using multiple threads) do you get 400% CPU usage? >>>>> >>>> >>>> >>>> Hi Mike, >>>> >>>> Sorry for the delay. I just had two copies of gnubg analyze the same >>>> game, using 3 ply analysis. Each instance of gnubg used 200% >>>> CPU. Each >>>> copy was set to use 4 evaluation threads. >>>> >>>> So what's the verdict here? Is Leopard simply not directing threads >>>> correctly? >>>> >>>> Louis >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bug-gnubg mailing list >>>> [email protected] http://lists.gnu.org/mailman/listinfo/bug-gnubg >>> >> > > > > > _______________________________________________ > Bug-gnubg mailing list > [email protected] > http://lists.gnu.org/mailman/listinfo/bug-gnubg > _______________________________________________ Bug-gnubg mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-gnubg
