Nice to see somebody working on accelerator hardware for Go (again). There actually were some attempts in the past and even publications that might be interesting for you [1][2].
I think in principle you have two possible directions when looking at accelerating basic MCTS for Go: a) Accelerating the computation of single playouts b) Computing a large number of playouts in parallel (likely using some pipelining technique) The problem with b) is that N parallel simulations are less valuable then N sequential ones, as simulations in MCTS are guided depending on former simulation outcomes. See e.g. [3] for according experimental results. You would likely end up with something that was called leaf-parallelization [4]. a) would be great, but is extremely difficult to achieve. The computation of strong playouts doesn't pose to much points for fine-grained parallelization and kind as well as amount of computations that need to be done for the generation and placement of moves is pretty inhomogeneous. Likely your hardware will run at a speed of about 150 to 200 Mhz (?), making it difficult to achieve a significant speedup over modern CPUs with 8 or more cores each running at more than 3 Ghz. I'm somehow convinced that looking at accelerator hardware for MCTS Go might be a bit to early. The algorithm and heuristics used are still changing a lot from year to year and adapting your HW design is likely more involved than adapting software. But that's just my personal point of view. Good luck! - Lars [1] Gao, Haiying, Wang, Fuming, Lei, Wei, and Lin, Yun: Monte Carlo simulation of 9x9 Go game on FPGA , IEEE International Conference on Intelligent Computing and Intelligent Systems, 865–869, October 2010 [2] Koizumi, Kenichi, Inaba, Mary, Hiraki, Kei, Ishii, Yasuo, Miyoshi, Takefumi, and Yoshizoe, Kazuki: Triple Line-Based Playout for Go - An Accelerator for Monte Carlo Go, Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs, 161–166, 2009 [3] Segal, Richard B.: On the Scalability of Parallel UCT, International Conference on Computer and Games, 36–47, 2010 [4] Chaslot, Guillaume M.J-B., Winands, Mark H.M., and van den Herik, H. Jaap: Parallel Monte-Carlo Tree Search, Conference on Computers and Games, 60–71, 2008 On Wed, 2013-05-22 at 06:35 +0000, Рождественский Дмитрий wrote: > Incredible, 100 nanoseconds is only about 300 instructions of a CPU. > Are you talking about 19x19? And 1 microsecond for my design will > probably be a worst-case (as I calculate freedom and capture > iteratively). When almost all stones have free places around it will > be down to ~100 nanoseconds. > As to the number of possible accelerators on-chip - it varies upon > price. I think it can be 5-250, for the price $250-$5000. So the cost > of a single simple accelerator will be $20-$50. > > Dmitry > > 21.05.2013, 23:13, "Mark Boon" <[email protected]>: > > Sounds interesting. But 1 microsecond for a move is not particularly > > fast. There are already implementations that do that in the 100-300 > > nanoseconds range on one core. 1 microsecond is probably considered > > as 'semi-light' playout. I suppose the question then becomes, how > > many of these could your accelerator do in parallel? > > > > Mark > > > > > > On Tue, May 21, 2013 at 8:06 AM, Alexander Kozlovsky > > <[email protected]> wrote: > > Я тоже кстати из ЛИАПа, с четвертого факультета, может и > > пересекались :) > > > > > > On Tue, May 21, 2013 at 7:02 PM, Рождественский Дмитрий > > <[email protected]> wrote: > > Hi all, > > > > I have got an idea to create a hardware accelerator > > for Go playing software. It will probably be a USB > > (or maybe PCI-Express) device that will be able to > > do some basic, but very time-consuming for > > general-purpose CPU calculations very fast. For > > example load a goban layout, make a number of random > > moves (as used in Monte-Carlo algorithm) and unload > > result back to a computer. > > > > As long as it will be a hardware, it will be able to > > do specified calculations only, but the speed will > > be very high. For example, making just a copy of the > > particular goban layout will require typically about > > 10 nanoseconds only (one internal clock cycle). > > Calculation of the validity and results of a > > particular move (including a check for ko and > > captured stones) will probably take 1 microsecond. > > This as usual may vary during debugging, but the > > current move calculation engine draft I've started > > to develop is about this figures. > > > > My nearest aims here are: > > - to understand a demand from go playing software > > developers, and > > - to understand what particular calculation chains > > are most demanded for hardware acceleration. > > > > Dmitry > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > , > > > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > -- Lars Schaefers Computer Engineering Group of Prof. Dr. Marco Platzner Paderborn Center for Parallel Computing, University of Paderborn Pohlweg 47-49, 33098 Paderborn, Germany Tel: +49 (0)5251 60 4341, Fax: +49 (0)5251 60 5377 Office: Building O 3.119 _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
