>In my own gpu experiment (light playouts), registers/memory were the >bounding factors on simulation speed.
I respect your experimental finding, but I note that you have carefully specified "light playouts," probably because you suspect that there may be a significant difference if playouts are heavy. I have not done any GPU experiments, so readers should take my guesswork FWIW. I think the code that is "light" is the only piece that parallelizes efficiently. Heavy playouts look for rare but important situations and handle them using specific knowledge. If a situation is "rare" then a warp of 32 playouts won't have many matches, so it will stall the other cores. I have no data regarding the probability of such stalls in heavy playouts, but I think they must be frequent. For example, if the ratio of heavy to light playouts is a four-fold increase in CPU time, then a sequential program is spending 75% of its time identifying and handling rare situations. My opinion is that heavy playouts are necessary, so if you put all this guesswork together then you come up with my opinion: Fermi is probably still not enough, but there are steps in the right direction. BTW, it occurs to me that we can approximate the efficiency of parallelization by taking execution counts from a profiler and post-processing them. I should do that before buying a new GPU. :-) _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
