On Fri, Oct 23, 2009 at 01:34:29PM -0600, Brian Sheppard wrote:
> I have not done any GPU experiments, so readers should take my guesswork
> FWIW. I think the code that is "light" is the only piece that parallelizes
> efficiently. Heavy playouts look for rare but important situations and
> handle them using specific knowledge. If a situation is "rare" then a warp
> of 32 playouts won't have many matches, so it will stall the other cores.
>
> I have no data regarding the probability of such stalls in heavy playouts,
> but I think they must be frequent. For example, if the ratio of heavy to
> light playouts is a four-fold increase in CPU time, then a sequential
> program is spending 75% of its time identifying and handling rare
> situations.
My experiment used light playouts, but ready for increasing weight by
picking a move according to probability distribution (which is easy to
do without any branch stalls itself).
I think adding that to the other experiment (playout-per-thread instead
of intersection-per-thread) wouldn't be too difficult, and then as long
as you track liberty counts, CrazyStone-style playouts are already just
a single step away, I'd guess; I think then the problem is not branch
stalls, but getting the bandwidth for pattern matching; then I'm
wondering if you could actually use some clever GPU-specific texture
tricks for that, but I haven't looked at that in depth (yet)...
> BTW, it occurs to me that we can approximate the efficiency of
> parallelization by taking execution counts from a profiler and
> post-processing them. I should do that before buying a new GPU. :-)
I wonder what do you mean by that.
--
Petr "Pasky" Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/