Re: [computer-go] Fast Board implementation

Don Dailey Fri, 15 Dec 2006 15:26:28 -0800

Another report on D programming.   It appears that D programs are about
1.5X slower in general - at least for what I'm doing.


At one point I reported about 7% slower, but that was a mistake on my
part.

The difference in programming effort and just plain clean code is quite
large - D was well designed and after doing a lot of programming in D,
it's a real drag going back to C!    

But it appears that you can have your cake and eat it too by just making
a low level library in C,  and doing most of the other stuff in D.    D
and C are link compatible and there is no effort in making this work -
it just works.  

I have indeed been able to experiment more quickly and easily doing it
in D, so I have come up with what seems like a nice formula:

  1.  Start by doing everything in D.
  2.  Optimize and experiment.
  3.  When you are happy,  redo the low level stuff in C.   It will
link 
      right in.
  4.  You can still continue to experiment.

Porting to C from D is quite easy because D is pretty much a cleaned up
version of C.   It's easy to make this port when you know exactly what
you want.

I've been able to do a substantial amount of experimentation using D and
the D program is now just slightly faster than the old C program due to
the improved
data structures.    I should be able to get close to 25,000 games per
second if
I get the 1.5 X improvement going to C.   This is only about 3K of code
that will
have to be written in C,  where much of the code is identical to C
anyway.

- Don


On Thu, 2006-12-14 at 22:51 -0500, Luke Gustafson wrote:
> Are you using the union-find algorithm with path compression for joining 
> strings?  It's very simple and very fast.
> http://en.wikipedia.org/wiki/Disjoint-set_data_structure#Disjoint-set_forests
> 
> The other main thing to consider is reducing branch mispredictions, which 
> (I'm guessing) could very well use more clock cycles on a P4 than performing 
> the actual calculations themselves.  I think this is why the next_stone list 
> is faster than using e.g. a recursive algorithm, which involves a lot of 
> if's that cannot be predicted.
> 
> Compact data I think is unnecessary, although perhaps eliminating the String 
> struct would speed things up a tad.  In fact, on a P4, shift operations are 
> slower than L1 cache memory retrievals, so fancy packing of data is not 
> worthwhile.  As long as all data fits in the L1 cache it should be max 
> speed.
> 
> I am working on a program myself, and I will be porting it to assembly at 
> some point; right now I am just experimenting with the algorithm design in 
> C.  (Although this happens pretty slowly since I am very busy with work.)  I 
> think that this sort of algorithm is perfect for assembly language, where 
> hand-optimization can be much better than compiler optimization.  I will let 
> you know how it turns out compared to Lukasz :)
> 
> >> At the time of Your post I've had it already implemented and regarded
> >> it like "my sweet secret" :)
> >
> > I don't know how sweet this secret is, but it does help.  I just
> > implemented it in Lazarus and I get about a 20% speedup.  That's not
> > bad, but nothing to write home about.  To a chess programmer this is
> > rather enormous!  I am guessing that the win would be much greater on
> > bigger boards.
> >
> > In principle it would probably be FAR faster, but it adds extra
> > complexity,
> > operations and overhead to executing a move which subtracts some from
> > the
> > savings.
> >
> > The question is: "what is the best way to implement it?"  The problem
> > is that you now must track pseudo liberty counts for every chain on
> > the board.  This is more complicated obviously.
> >
> > Every time a stone is placed or removed, 4 additional entries must be
> > accessed in order to update the pseudo liberty counts.
> >
> > It's still a good trade-off because the testing for whether a move
> > is a capture or suicide is cheap.
> >
> > I'm reviewing my design now and I think I see some ways to improve.
> >
> > Here is the current  data structure (assuming the 9x9 board):
> >
> >  1.  A 10x11 board     (actually 10x11 plus 1 for diagonal checking.)
> >          0 = empty
> >          1 = white
> >          2 = black
> >          4 = off-board points.
> >
> >  2.  Another 10x11 board-like array - if there is a occupied point on
> >      the board this array contains the index (hash key) of a "string"
> >      data type.
> >
> >  3.  A list of string data types (a struct in C):
> >        1.  stone count
> >        2.  pseudo liberties for the whole string
> >        3.  a reference stone.
> >
> > To save time, the list of strings is really more like a hash table,
> > not a proper list.  It's a perfect hash table because the keys will
> > never collide.  I use the reference stone as the key for each string
> > which is generally the first stone placed in the string.  So the hash
> > key computation is just a table lookup and is the only thing the
> > second table is used for.
> >
> > So to summarize, I really have 3 10x11 arrays, one of them is an array
> > of structs in C (actually in D) and is viewed as a hash table.
> >
> > I could pack all of the information into a single array.  Here is one
> > possible scheme which fits in 32 bits and accommodates all board sizes:
> >
> >  3 bits to identify white/black/empty/off-board
> >  9 bits for the hash key (the reference stone)
> >  9 bits for pseudo liberty count
> >  9 bits for stone count
> >
> > It's kind of interesting when you think about it.   A single array does
> > triple duty as a:
> >
> >    1. Go board.
> >    2. Hash table.
> >    3. lookup table - to get the "hash key" (table index) of the data
> > for
> >       the string that occupies this point.
> >
> > The hash table bits in the table are usually irrelevant, it is only
> > valid for the single point that represent the hash table entry for a
> > given string.
> >
> > So this scheme does no list manipulations and has a fixed size in
> > memory.  It's compact and I think fast.  Is there anything obvious I'm
> > missing that simplifies things?
> >
> > By the way, stone count field is helpful but not necessary.  But it's
> > trivial to keep updated and it's convenient.
> >
> > Using the compact representation would slow down some of the
> > operations which are in fast inner loops now, but would probably be a
> > win overall, perhaps a big win because I'm not manipulating several
> > different arrays.
> >
> > So I will also try this compact implementation unless someone has a
> > superior scheme to suggest.
> >
> >
> > - Don
> >
> >
> >
> >
> >
> >
> >
> > On Mon, 2006-12-11 at 20:00 +0100, Łukasz Lew wrote:
> >> At the time of Your post I've had it already implemented and regarded
> >> it like "my sweet secret" :) , so I was expecting that when You
> >> published it everybody will appreciate, and start using. But I guess
> >> it wasn't easy to see benefits.
> >> I wonder how many really good ideas came through this list unnoticed.
> >>
> >> Best Regards,
> >> Lukasz
> >>
> >>
> >> On 12/11/06, House, Jason J. <[EMAIL PROTECTED]> wrote:
> >> > Wow.  Did some of my early posts on liberties/chains actually get used
> >> > by someone?  Or did the idea of pseudo-liberties and union sets exist
> >> > before my post(s) on the topic?  I remember a lot of negative feedback
> >> > on pseudo-liberties at the time of my post.
> >> >
> >> > -----Original Message-----
> >> > From: [EMAIL PROTECTED]
> >> > [mailto:[EMAIL PROTECTED] On Behalf Of Lukasz Lew
> >> > Sent: Monday, December 11, 2006 1:31 PM
> >> > To: [EMAIL PROTECTED]
> >> > Cc: computer-go
> >> > Subject: Re: [computer-go] Fast Board implementation
> >> >
> >> > Pseudo liberties of a group is a sum of liberties of each stone,
> >> > example:
> >> >
> >> > OOOOO
> >> > OXXXO
> >> > OX.XO
> >> > OXXXO
> >> > OOOOO
> >> >
> >> > "X" group have 4 pseudo liberties.
> >> >
> >> > If You merge two groups just add pseudo liberties.
> >> > If PL = 0 then group should be removed.
> >> >
> >> > This is simple and sufficient :)
> >> >
> >> > Lukasz
> >> > On 12/11/06, Don Dailey <[EMAIL PROTECTED]> wrote:
> >> > >
> >> > >
> >> > > On Mon, 2006-12-11 at 18:22 +0100, Łukasz Lew wrote:
> >> > > > - pseudo liberties at top of union-set tree
> >> > >
> >> > >
> >> > > Refresh my memory on this.   I remember talking about this a long
> >> > time
> >> > > ago.  A psuedo liberty is an upper bound on how many liberties there
> >> > are
> >> > > for a given string, correct?   Sometimes a liberty gets counted twice
> >> > or
> >> > > more?
> >> > >
> >> > > But if it goes to zero or one it's correct for any given string?
> >> > >
> >> > > Is the idea that it's lightning fast to update incrementally?
> >> > >
> >> > > - Don
> >> > >
> >> > >
> >> > >
> >> > _______________________________________________
> >> > computer-go mailing list
> >> > [email protected]
> >> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >> >
> >> _______________________________________________
> >> computer-go mailing list
> >> [email protected]
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> > _______________________________________________
> > computer-go mailing list
> > [email protected]
> > http://www.computer-go.org/mailman/listinfo/computer-go/ 
> 

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Fast Board implementation

Reply via email to