David Fotland: <05c301ca242f$03433740$09c9a5...@com>:
>How much would you lose for 19x19 board?  A board representation is not very
>interesting unless it scales to 19 line boards.

Wait for Intel Larabee processor which has 512 bit SIMD registers.

Hideki

>From: [email protected]
>[mailto:[email protected]] On Behalf Of René van de
>Veerdonk
>Sent: Sunday, August 23, 2009 1:11 PM
>To: [email protected]
>Subject: [computer-go] Bitmap Go revisited and mockup implementation
>
> 
>
>Hi all,
>
> 
>
>After years of reading this very interesting list, I decided to make my
>first contribution this week after reading once again about bitmap go. There
>is no freely available implementation of a bitmap based engine that I know
>of, so here is a start. Note that I am not a professional programmer,
>self-taught, and this is a part-time hobby, so do not expect perfect form.
>
> 
>
>The attachment is an attempt to create an efficient implementation of a
>bitmap based light playout engine, following mostly the recommendations as
>spelled out by Antoine de Maricourt in his November 2006 message to this
>list. The application is single threaded and achieves around 75-80 kpps on
>my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
>using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
>undoubtedly suffers from portability issues as a result. The 128-bit SSE
>instructions are called through intrinsic functions, hopefully at least
>these interfaces are the same for other compilers. The single goal of this
>mockup was to do some benchmarking, so there is no gtp-interface, but that
>would be rather easy to add for those skilled in the art. The standard rules
>are used with one exception, the Brown eye-rule is used. Multi-stone suicide
>is explicitly forbidden. No superko checks are performed during playouts,
>and there is no mercy-rule either.
>
> 
>
>I would be particularly interested to know if a 64-bit OS will improve the
>performance significantly, as it does have a lot less register pressure.
>Moving to a 64-bit OS will also make some instructions available that allow
>for further simplifications/efficiency improvements in the code. As will
>enabling SSE3 or SSE4.1 instructions.
>
> 
>
>One note, the random number generator used is from Agner Fog's public
>library and was not included in the attachment. You will have to provide a
>random number generator yourself or download the appropriate version from
>his well-known website (www.agner.org/optimize)
>
> 
>
>I hope this is the start of a discussion about further improvements that can
>be made, but you may use this code as you wish with no restrictions or
>guarantees. See the readme.txt file included in the attachment for more
>details (also partially included below).
>
> 
>
>Comments and feedback welcome,
>
> 
>
>René van de Veerdonk
>
> 
>
>Excerpts from the readme.txt file:
>
> 
>
>Introduction:
>
>=============
>
> 
>
>Mockup of a high performance implementation of the game of Go using bitmaps
>
>as the principle datastructures. This implementation is limited to measuring
>
>the performance of the principle components by playing random playouts from
>
>an empty 9x9 board, using Chinese scoring, not allowing pass moves until
>
>no other moves are available. Suicide is not allowed, neither for single
>
>or multi-stone groups. No checks are implemented for superko, but simple
>
>ko's are disallowed explicitly.
>
> 
>
>Some implementation details:
>
>============================
>
> 
>
>Benchmarking is done in test_board.cpp, where the number of playouts (n) and
>
>repeats (j) for each test is set. Benchmarking is done starting from an
>empty
>
>9x9 board until two (three with ko) passes are played in sequence. The
>
>benchmark playout follow the same scheme as method
>board_t::play_random_game,
>
>but with timing instrumentation added.
>
> 
>
>Clock counts (cc) are measured using the cc_time_of macro. Total run-time
>
>measurement relies on the windows library.
>
> 
>
>For each move during a playout, the program goes through a sequence of
>
>(1) identify all legal moves (~50cc)
>
>    this follows Brown's standard definition; this definition differs from
>
>    most Monte-Carlo engines and has a different set of pros and cons;
>
>    bitmaps allow all moves to be processed in parallel
>
>(2) pick a uniformly random move from the available choices (~75cc)
>
>    relies on a modulus operation; is not 100% uniform to save speed
>
>(3) playing it on the board (~170cc)
>
>    lengthiest method; but rather straightforward
>
> 
>
>Playouts end after 256 moves or whenever two passes are played in succession
>
>(three in case of a ko). The routines for each of these basic steps are
>
>contained in the file board.cpp. The underlying datastructures are 128-bit
>
>bitmaps, for which the sse2 instructions used to manipulate them are
>contained
>
>in the file bitmap.cpp.
>
> 
>
>Further improvements can be expected from using any 64-bit OS that supports
>
>several more assembly instructions, and from using sse4.1 instructions.
>
> 
>
>Example output (on a Intel 2.40 GHz Centrino Duo labtop):
>
>=========================================================
>
> 
>
>[game] = 30469.7, [moves] = 111.071
>
>[game] = 30245.4, [moves] = 111.068
>
>[game] = 30221.7, [moves] = 111.089
>
>[game] = 30264.8, [moves] = 111.122
>
>[game] = 30205.8, [moves] = 111.101
>
> 
>
>[game] = 77.1084 kpps, [moves] = 111.023
>
>[game] = 78.0488 kpps, [moves] = 111.045
>
>[game] = 77.1084 kpps, [moves] = 111.046
>
>[game] = 78.0488 kpps, [moves] = 111.131
>
>[game] = 78.0488 kpps, [moves] = 111.082
>
> 
>
>[legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187
>
>[legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201
>
>[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276
>
>[legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092
>
>[legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221
>
> 
>
>finished, press any key to exit
>
> 
>
>Explanation of output:
>
>======================
>
> 
>
>There are three sets of repeated (5x) measurements, performed and timed
>
>independently to avoid cross-contamination of inserted serialization
>
>instructions and timing overhead (which still does not quite prevent
>
>reordering of instructions across timing boundaries).
>
> 
>
>[game]  cc's per full playout
>
>[moves] average number of moves per playout
>
> 
>
>[game]  thousands of playouts per second (kpps)
>
>[moves] average number of moves per playout
>
> 
>
>"calls per playout" / "cc's per call"
>
>[legal] identifies all legal moves on the board
>
>[pick]  picks a random move
>
>[play]  processes a move
>
>[score] scores a final position
>
>[win]   winrate for black
>
> 
>
>Acknowledgements:
>
>=================
>
> 
>
>Antoine de Maricourt, for his very helpful description of his bitmap go
>
>  implementation in detail on the computer-go mailing list:
>
>  http://computer-go.org/pipermail/computer-go/2006-November/007136.html
>
>Gunnar Farneback, for providing the well documented Brown engine,
>
>  whoms legal move definition is used in this mockup
>
>Lukasz Lew, for the high performance Libego implementation, which has
>inspired
>
>  various parts of this mockup as well
>
>Agner Fog, for the Mersenne Twister and clock-cycle timing library
>
>Computer-go mailing list, where many ideas are posted and discussed
>
> 
>
> 
>---- inline file
>_______________________________________________
>computer-go mailing list
>[email protected]
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[email protected] (Kato)
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to