On 3 Nov 2001, at 21:40, Kel Utendorf wrote: > At 21:01 11/03/2001 -0500, George Woltman wrote: > >Can prime95 take advantage of SMT? I'm skeptical. If the FFT is > broken >up to run in two threads, I'm afraid L2 cache pollution will > negate any >advantage of SMT. Of course, I'm just guessing - to test > this theory out we >should compare our throughput running 1 vs. 2 > copies of prime95 on an >SMT machine.
I'm not sure I fully understand the way in which a SMT processor would utilise cache. But I can't see how the problem could be worse than running two copies of a program on a SMP system. This seems to work fairly well in both Windows and linux regimes (attatching a thread to a processor and therefore its associated cache, rigidly in the case of Windows, loosely but intelligently in the case of linux). If an SMT processor has a unified cache, cache pollution should surely be not too much of a problem? Running one copy & thereby getting benefit of the full cache size would run that one copy faster, (just as happens with SMP systems where memory bandwidth can be crucial) but the total throughput with two copies running would surely be greater. Especially on a busy system, where two threads get twice as many timeslices as one! If there is some way in which the FFT could be broken down into roughly equal sized chunks, it _might_ be worth synchronizing two streams so that e.g. transform in on one thread was always in parallel with transform out on the other, and vice versa. Obviously you'd need to be running on two different exponents but using the same FFT length to gain from this technique. Whether this would be any better than running unsynchronized would probably require experimentation. > > Could things be setup so that factoring and LL-testing went on > "simultaneously?" This would speed up the overall amount of work > being done. Because trial factoring, or P-1/ECM on _small_ exponents, have a very low memory bus loading, running a LL test and factoring in parallel on a dual-processor SMP system makes a lot of sense. I suspect the same situation would apply in an SMT environment. The "problem" of mass deployment (almost everyone in this position, instead of only a few of us) is that there is a great deal of LL testing effort required in comparison to trial factoring, so running two LL tests in parallel but inefficiently would bring us to "milestones" faster than the efficient LL/trial factoring split. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
