>In my own gpu experiment (light playouts), registers/memory were the
>bounding factors on simulation speed.

I respect your experimental finding, but I note that you have carefully
specified "light playouts," probably because you suspect that there may be a
significant difference if playouts are heavy.

I have not done any GPU experiments, so readers should take my guesswork
FWIW. I think the code that is "light" is the only piece that parallelizes
efficiently. Heavy playouts look for rare but important situations and
handle them using specific knowledge. If a situation is "rare" then a warp
of 32 playouts won't have many matches, so it will stall the other cores.

I have no data regarding the probability of such stalls in heavy playouts,
but I think they must be frequent. For example, if the ratio of heavy to
light playouts is a four-fold increase in CPU time, then a sequential
program is spending 75% of its time identifying and handling rare
situations.

My opinion is that heavy playouts are necessary, so if you put all this
guesswork together then you come up with my opinion: Fermi is probably still
not enough, but there are steps in the right direction.

BTW, it occurs to me that we can approximate the efficiency of
parallelization by taking execution counts from a profiler and
post-processing them. I should do that before buying a new GPU. :-)


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to