Re: [computer-go] Scalability study suggestion

Don Dailey Sat, 02 Feb 2008 10:59:27 -0800

Hi Chuck,

Thank you for your interesting suggestions.    I have previously
considered a system where the distribution is based on how many
contestants.   For instance if there are hundreds of players you would
want to generate best of 5 or 6 or more, but if there were only 3 or 4
you might want best of 2.


But it's not just the number of players either, if you have 1000 players
but they are all within 50 ELO of each other,  you could almost choose
your opponent randomly.     

There is definitely a seed problem because each instance of the tester
is ignorant of the players strengths.   The autotester eventually
discovers their approximate ELO level and then it's ok but only after a
number of games, many of which are ugly mismatches.     

However, when I add a player to the autotester I do try to guess at a
reasonable rating in order to prevent this.    The autotester is not too
aggressive about rating players - they are allowed to change fairly
slowly so that an initial seeded will "stick."   

I am definitely interested in a better pairing algorithm, and I have
plans to improve the autotester and distribute it as freeware when it's
easier to use (right now you must know sql to really get good use out of
it.)      What I actually have in mind is some kind of quick monte carlo
simulation at scheduling time in order to make some kind of decision
that fits whatever constraints we think are useful.

I won't have a chance to make any serious changes that require a lot of
testing in the near future,  but when we do the next study,  I can at
least change the "N" in best of N to a higher number and decrease the
number of mismatches.

Something I was big on a year or two ago was to schedule as if the
players were playing a series of swiss tournaments.    This is a very
natural system which mirrors how programs and people would get paired in
the real world.     However, I ran some simulations and the distribution
is horrible.  There are pockets of players you would rarely play and
other pockets you are very likely to play and the frequency doesn't
closely match the strength of the players - in other words you may be
much more likely to face a player 600 ELO distant that one 200 ELO
different.       

I ran this simulation 3 different ways.  One of the ways was based on
correct pairings where you take the ratings of the players into account
(top half plays bottom have in each scoring section of each round.)   
The other way was to assume nothing about their actual strengths and
pair them as if they were all playing in their first tournament ever.   
A third way was to give them random ratings for pairing purposes, but
pair them the correct swiss way assuming these ratings were their actual
ratings.     All of these systems generated uneven pockets.  


- Don



Chuck Paulson wrote:
>
> I have been following the scaling study closely. Thanks for everyone
> for gathering such interesting data and especially to Don and the
> people donating computers.
>
>  
>
> I have 2 suggestions that could help increase the amount of
> information gathered from all the CPU hours being used to play these
> games. The first suggestion is about changing the algorithm used to
> pair players together, and the second is about “seeding” players to an
> appropriate starting position.
>
>  
>
> Suppose we have two players A and B who each have about a 50% chance
> of winning when they play each other. We expect to gain about
> 0.5*log2(0.5) + 0.5*log2(0.5) = 1 bit of information from every game
> they play. On the other hand, if player A has a 95% chance of beating
> player B we get about 0.95*log2(0.95) + 0.05*log2(0.05) = about 0.29
> bits of information when they play each other. Therefore, the more
> often you are playing nearly equal players, the more information is
> gained.
>
>  
>
> Now suppose we have a list of players ranked 0 to N, where the lower
> the ranking, the stronger the player. Assume that player n takes twice
> as long to play a game as player n+1 for any n. Also assume that the
> Elo difference between player n and player n+1 is 80 points, and that
> the cost of player 0 playing a game is 1.0 time units. These
> assumptions seem reasonable for the upper levels of the study if we
> only consider the Mogo programs up to Mogo_16 (the ones that have
> about 500 games). Using these assumptions we can build the following
> table.
>
>  
>
> Rank    Cost                 WinProb          Bits                 
> Bits/Cost
>
> 0          2.0000             0.5000             1.0000             0.5000
>
> 1          1.5000             0.3869             0.9627             0.6418
>
> 2          1.2500             0.2847             0.8618             0.6895
>
> 3          1.1250             0.2008             0.7234             0.6431
>
> 4          1.0625             0.1368             0.5758             0.5419
>
> 5          1.0313             0.0909             0.4395             0.4262
>
> 6          1.0156             0.0594             0.3249             0.3199
>
> 7          1.0078             0.0383             0.2344             0.2326
>
> 8          1.0039             0.0245             0.1660             0.1654
>
> 9          1.0020             0.0156             0.1160             0.1157
>
> 10        1.0010             0.0099             0.0801             0.0801
>
>  
>
> For example, if player 0 plays player1 then the total cost is 1.5
> because player 0 takes 1 unit of time and player 1 takes 0.5 (half of
> player 0 time) for a total cost of 1.5 units of time. The Elo
> difference is 80, so using the formula P = 1 / (1 + 10^(80/400)) =
> 0.3869 we see that player 1 has about a 39% change of winning. The
> bits of information we expect to get are 0.3869*log2(0.3869) + (1 –
> 0.3869)*log2(1 – 0.3869) = 0.9627. The Bits/Cost is 0.9627/1.5 =
> 0.6418. The table shows the results of player 0 playing all players
> ranked 0 to 10 ranks lower. If player 0 plays himself the cost is 2.0,
> the probability of a win is 50%, resulting in 1 bit of information and
> the Bits/Cost is 0.5.
>
>  
>
> From the table, to maximize the information gain per unit of time, the
> best games are played between players that are 2 ranks apart. That has
> to be balanced against having a wide variety of games to get robust
> rankings.
>
>  
>
> I believe the current method is to pick 3 random opponents and choose
> the one closest to the player. If there are N players, then the top
> player would expect his average opponent to be about N / (3 + 1) ranks
> down the list. Since there are 34 players the top player will play the
> player 8.25 ranks down on average. From the table, this would result
> in about 0.1654 bits per game. The actual rate is higher than this
> because a range of opponents are played, some closer than 8.25 ranks.
> Also notice that the larger the pools of players, the further away in
> rank the top player’s average opponent will be.
>
>  
>
> I propose that instead of the current method, if players play an
> opponent R ranks away where R follows a normal distribution with
> standard deviation 3. If there is no player at the suggested rank,
> generate another rank and try again. For example, the top player has
> no player ranked higher, so if a positive R is generated, keep
> generating new ranks until a negative rank R is generated. This method
> is not affected by the size of the player pool and also results in the
> average opponent being much closer than 8.25 ranks away.
>
>  
>
> This should result in more information being gained from each game,
> but still preserve the property that any player has a chance of
> playing any other player, although if they’re not within about 2.5
> standard deviations (3 * 2.5 = 7.5 ranks) then it becomes a less than
> 1% chance.
>
>  
>
> The second suggestion is that when some information is known about new
> players being introduced into the ladder, they could be “seeded” into
> an appropriate rank instead of having to play their way up the ladder.
> For example, if we assume Mogo_16 is about 80 Elo points stronger than
> Mogo_15 then we could pretend that Mogo_16 has 2 wins and 1 loss
> against Mogo_15 before any actual games are played. This would seed
> Mogo_16 into about the right rank, and it would immediately start
> playing high information games instead of games against opponents that
> are too weak to yield much information.
>
>  
>
> Chuck Paulson
>
> www.PuffinwareLLC.com <http://www.puffinwarellc.com/>
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Scalability study suggestion

Reply via email to