If an engine (or an engine's playout policy) needs to check the legality of 
every move before making a selection, this could still be a benefit.

René van de Veerdonk wrote:
David,

Confession: I have not tested 19x19. As you have noted, and others before you over the years, a 19x19 board does not fit in one but three 128-bit registers and there would be a rather big penalty as a result, perhaps (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his disappointment about the speed advantage even on 9x9 in the e-mail to the list that served as my basis. My hope, as Hideko Kato points out, is in longer registers or perhaps being able to port this to a GPU. A 64-bit OS with twice the number registers would also surely help out. Nevertheless, I was surprised that I was able to get to almost the same level of speed as Libego provides.

The goals for this mockup were very humble: (1) to provide a basic implementation to see what it can do (and if I could do it), and (2) learn how to work with assembly and/or intrinsic functions (I had never done that before). It may not be too hard to try 19x19 and just see what happens. I may attempt this as a follow-up.

René van de Veerdonk

2009/8/23 David Fotland <fotl...@smart-games.com <mailto:fotl...@smart-games.com>>

    How much would you lose for 19x19 board?  A board representation is
    not very interesting unless it scales to 19 line boards.

    David

    *From:* computer-go-boun...@computer-go.org
    <mailto:computer-go-boun...@computer-go.org>
    [mailto:computer-go-boun...@computer-go.org
    <mailto:computer-go-boun...@computer-go.org>] *On Behalf Of *René
    van de Veerdonk
    *Sent:* Sunday, August 23, 2009 1:11 PM
    *To:* computer-go@computer-go.org <mailto:computer-go@computer-go.org>
    *Subject:* [computer-go] Bitmap Go revisited and mockup implementation

    Hi all,

    After years of reading this very interesting list, I decided to make
    my first contribution this week after reading once again about
    bitmap go. There is no freely available implementation of a bitmap
    based engine that I know of, so here is a start. Note that I am not
    a professional programmer, self-taught, and this is a part-time
    hobby, so do not expect perfect form.

    The attachment is an attempt to create an efficient implementation
    of a bitmap based light playout engine, following mostly the
    recommendations as spelled out by Antoine de Maricourt in his
    November 2006 message to this list. The application is single
    threaded and achieves around 75-80 kpps on my Centrino Duo 2.4 GHz
    Intel labtop. The mockup was developed in C/C++ using the Microsoft
    Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly
    suffers from portability issues as a result. The 128-bit SSE
    instructions are called through intrinsic functions, hopefully at
    least these interfaces are the same for other compilers. The single
    goal of this mockup was to do some benchmarking, so there is no
    gtp-interface, but that would be rather easy to add for those
    skilled in the art. The standard rules are used with one exception,
    the Brown eye-rule is used. Multi-stone suicide is explicitly
    forbidden. No superko checks are performed during playouts, and
    there is no mercy-rule either.

    I would be particularly interested to know if a 64-bit OS will
    improve the performance significantly, as it does have a lot less
    register pressure. Moving to a 64-bit OS will also make some
    instructions available that allow for further
    simplifications/efficiency improvements in the code. As will
    enabling SSE3 or SSE4.1 instructions.

    One note, the random number generator used is from Agner Fog's
    public library and was not included in the attachment. You will have
    to provide a random number generator yourself or download the
    appropriate version from his well-known website
    (www.agner.org/optimize <http://www.agner.org/optimize>)

    I hope this is the start of a discussion about further improvements
    that can be made, but you may use this code as you wish with no
    restrictions or guarantees. See the readme.txt file included in the
    attachment for more details (also partially included below).

    Comments and feedback welcome,

    René van de Veerdonk

    Excerpts from the readme.txt file:

    Introduction:

    =============

    Mockup of a high performance implementation of the game of Go using
    bitmaps

    as the principle datastructures. This implementation is limited to
    measuring

    the performance of the principle components by playing random
    playouts from

    an empty 9x9 board, using Chinese scoring, not allowing pass moves until

    no other moves are available. Suicide is not allowed, neither for single

    or multi-stone groups. No checks are implemented for superko, but simple

    ko's are disallowed explicitly.

    Some implementation details:

    ============================

    Benchmarking is done in test_board.cpp, where the number of playouts
    (n) and

    repeats (j) for each test is set. Benchmarking is done starting from
    an empty

    9x9 board until two (three with ko) passes are played in sequence. The

    benchmark playout follow the same scheme as method
    board_t::play_random_game,

    but with timing instrumentation added.

    Clock counts (cc) are measured using the cc_time_of macro. Total
    run-time

    measurement relies on the windows library.

    For each move during a playout, the program goes through a sequence of

    (1) identify all legal moves (~50cc)

        this follows Brown's standard definition; this definition
    differs from

        most Monte-Carlo engines and has a different set of pros and cons;

        bitmaps allow all moves to be processed in parallel

    (2) pick a uniformly random move from the available choices (~75cc)

        relies on a modulus operation; is not 100% uniform to save speed

    (3) playing it on the board (~170cc)

        lengthiest method; but rather straightforward

    Playouts end after 256 moves or whenever two passes are played in
    succession

    (three in case of a ko). The routines for each of these basic steps are

    contained in the file board.cpp. The underlying datastructures are
    128-bit

    bitmaps, for which the sse2 instructions used to manipulate them are
    contained

    in the file bitmap.cpp.

    Further improvements can be expected from using any 64-bit OS that
    supports

    several more assembly instructions, and from using sse4.1 instructions.

    Example output (on a Intel 2.40 GHz Centrino Duo labtop):

    =========================================================

    [game] = 30469.7, [moves] = 111.071

    [game] = 30245.4, [moves] = 111.068

    [game] = 30221.7, [moves] = 111.089

    [game] = 30264.8, [moves] = 111.122

    [game] = 30205.8, [moves] = 111.101

    [game] = 77.1084 kpps, [moves] = 111.023

    [game] = 78.0488 kpps, [moves] = 111.045

    [game] = 77.1084 kpps, [moves] = 111.046

    [game] = 78.0488 kpps, [moves] = 111.131

    [game] = 78.0488 kpps, [moves] = 111.082

    [legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187

    [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201

    [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276

    [legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092

    [legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221

    finished, press any key to exit

    Explanation of output:

    ======================

    There are three sets of repeated (5x) measurements, performed and timed

    independently to avoid cross-contamination of inserted serialization

    instructions and timing overhead (which still does not quite prevent

    reordering of instructions across timing boundaries).

    [game]  cc's per full playout

    [moves] average number of moves per playout

    [game]  thousands of playouts per second (kpps)

    [moves] average number of moves per playout

    "calls per playout" / "cc's per call"

    [legal] identifies all legal moves on the board

    [pick]  picks a random move

    [play]  processes a move

    [score] scores a final position

    [win]   winrate for black

    Acknowledgements:

    =================

    Antoine de Maricourt, for his very helpful description of his bitmap go

      implementation in detail on the computer-go mailing list:

      http://computer-go.org/pipermail/computer-go/2006-November/007136.html

    Gunnar Farneback, for providing the well documented Brown engine,

      whoms legal move definition is used in this mockup

    Lukasz Lew, for the high performance Libego implementation, which
    has inspired

      various parts of this mockup as well

    Agner Fog, for the Mersenne Twister and clock-cycle timing library

    Computer-go mailing list, where many ideas are posted and discussed


    _______________________________________________
    computer-go mailing list
    computer-go@computer-go.org <mailto:computer-go@computer-go.org>
    http://www.computer-go.org/mailman/listinfo/computer-go/



------------------------------------------------------------------------

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to