David Fotland: <05c301ca242f$03433740$09c9a5...@com>: >How much would you lose for 19x19 board? A board representation is not very >interesting unless it scales to 19 line boards.
Wait for Intel Larabee processor which has 512 bit SIMD registers. Hideki >From: [email protected] >[mailto:[email protected]] On Behalf Of René van de >Veerdonk >Sent: Sunday, August 23, 2009 1:11 PM >To: [email protected] >Subject: [computer-go] Bitmap Go revisited and mockup implementation > > > >Hi all, > > > >After years of reading this very interesting list, I decided to make my >first contribution this week after reading once again about bitmap go. There >is no freely available implementation of a bitmap based engine that I know >of, so here is a start. Note that I am not a professional programmer, >self-taught, and this is a part-time hobby, so do not expect perfect form. > > > >The attachment is an attempt to create an efficient implementation of a >bitmap based light playout engine, following mostly the recommendations as >spelled out by Antoine de Maricourt in his November 2006 message to this >list. The application is single threaded and achieves around 75-80 kpps on >my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++ >using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and >undoubtedly suffers from portability issues as a result. The 128-bit SSE >instructions are called through intrinsic functions, hopefully at least >these interfaces are the same for other compilers. The single goal of this >mockup was to do some benchmarking, so there is no gtp-interface, but that >would be rather easy to add for those skilled in the art. The standard rules >are used with one exception, the Brown eye-rule is used. Multi-stone suicide >is explicitly forbidden. No superko checks are performed during playouts, >and there is no mercy-rule either. > > > >I would be particularly interested to know if a 64-bit OS will improve the >performance significantly, as it does have a lot less register pressure. >Moving to a 64-bit OS will also make some instructions available that allow >for further simplifications/efficiency improvements in the code. As will >enabling SSE3 or SSE4.1 instructions. > > > >One note, the random number generator used is from Agner Fog's public >library and was not included in the attachment. You will have to provide a >random number generator yourself or download the appropriate version from >his well-known website (www.agner.org/optimize) > > > >I hope this is the start of a discussion about further improvements that can >be made, but you may use this code as you wish with no restrictions or >guarantees. See the readme.txt file included in the attachment for more >details (also partially included below). > > > >Comments and feedback welcome, > > > >René van de Veerdonk > > > >Excerpts from the readme.txt file: > > > >Introduction: > >============= > > > >Mockup of a high performance implementation of the game of Go using bitmaps > >as the principle datastructures. This implementation is limited to measuring > >the performance of the principle components by playing random playouts from > >an empty 9x9 board, using Chinese scoring, not allowing pass moves until > >no other moves are available. Suicide is not allowed, neither for single > >or multi-stone groups. No checks are implemented for superko, but simple > >ko's are disallowed explicitly. > > > >Some implementation details: > >============================ > > > >Benchmarking is done in test_board.cpp, where the number of playouts (n) and > >repeats (j) for each test is set. Benchmarking is done starting from an >empty > >9x9 board until two (three with ko) passes are played in sequence. The > >benchmark playout follow the same scheme as method >board_t::play_random_game, > >but with timing instrumentation added. > > > >Clock counts (cc) are measured using the cc_time_of macro. Total run-time > >measurement relies on the windows library. > > > >For each move during a playout, the program goes through a sequence of > >(1) identify all legal moves (~50cc) > > this follows Brown's standard definition; this definition differs from > > most Monte-Carlo engines and has a different set of pros and cons; > > bitmaps allow all moves to be processed in parallel > >(2) pick a uniformly random move from the available choices (~75cc) > > relies on a modulus operation; is not 100% uniform to save speed > >(3) playing it on the board (~170cc) > > lengthiest method; but rather straightforward > > > >Playouts end after 256 moves or whenever two passes are played in succession > >(three in case of a ko). The routines for each of these basic steps are > >contained in the file board.cpp. The underlying datastructures are 128-bit > >bitmaps, for which the sse2 instructions used to manipulate them are >contained > >in the file bitmap.cpp. > > > >Further improvements can be expected from using any 64-bit OS that supports > >several more assembly instructions, and from using sse4.1 instructions. > > > >Example output (on a Intel 2.40 GHz Centrino Duo labtop): > >========================================================= > > > >[game] = 30469.7, [moves] = 111.071 > >[game] = 30245.4, [moves] = 111.068 > >[game] = 30221.7, [moves] = 111.089 > >[game] = 30264.8, [moves] = 111.122 > >[game] = 30205.8, [moves] = 111.101 > > > >[game] = 77.1084 kpps, [moves] = 111.023 > >[game] = 78.0488 kpps, [moves] = 111.045 > >[game] = 77.1084 kpps, [moves] = 111.046 > >[game] = 78.0488 kpps, [moves] = 111.131 > >[game] = 78.0488 kpps, [moves] = 111.082 > > > >[legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187 > >[legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201 > >[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276 > >[legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092 > >[legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221 > > > >finished, press any key to exit > > > >Explanation of output: > >====================== > > > >There are three sets of repeated (5x) measurements, performed and timed > >independently to avoid cross-contamination of inserted serialization > >instructions and timing overhead (which still does not quite prevent > >reordering of instructions across timing boundaries). > > > >[game] cc's per full playout > >[moves] average number of moves per playout > > > >[game] thousands of playouts per second (kpps) > >[moves] average number of moves per playout > > > >"calls per playout" / "cc's per call" > >[legal] identifies all legal moves on the board > >[pick] picks a random move > >[play] processes a move > >[score] scores a final position > >[win] winrate for black > > > >Acknowledgements: > >================= > > > >Antoine de Maricourt, for his very helpful description of his bitmap go > > implementation in detail on the computer-go mailing list: > > http://computer-go.org/pipermail/computer-go/2006-November/007136.html > >Gunnar Farneback, for providing the well documented Brown engine, > > whoms legal move definition is used in this mockup > >Lukasz Lew, for the high performance Libego implementation, which has >inspired > > various parts of this mockup as well > >Agner Fog, for the Mersenne Twister and clock-cycle timing library > >Computer-go mailing list, where many ideas are posted and discussed > > > > >---- inline file >_______________________________________________ >computer-go mailing list >[email protected] >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [email protected] (Kato) _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
