i agree with everything except for the seeding.  it takes very few games
(especially with the distribution you suggest) to get somewhat near the
right spot.  with 500-ish games being played per player, the initial time
to get into the right place isn't unreasonable.

keep in mind that if you totally outmatch someone, likely the time they
spend thinking will be peanuts compared to you, because we're looking
at doublings.  so those initial "where do i roughly belong" games will go
relatively quickly.

another way of thinking about this is that if you use a uniform prior,
the convergence to the actual value isn't much slower than if you
use a (potentially biased) prior.  fat14, mogos 17, 18, big16 and potentially
big18 might have all been seeded too high this way, and they're the
ones we want the most accurate numbers on at this point.

s.

----- Original Message ----
From: Chuck Paulson <[EMAIL PROTECTED]>
To: computer-go <computer-go@computer-go.org>
Sent: Saturday, February 2, 2008 1:02:15 PM
Subject: [computer-go] Scalability study suggestion





 
 

<!--

 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times 
New Roman";}
a:link, span.MsoHyperlink
        {color:blue;text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;text-decoration:underline;}
span.EmailStyle17
        {font-family:Arial;color:windowtext;}
 _filtered {margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {}
-->






I have been following the scaling study closely. Thanks for everyone
for gathering such interesting data and especially to Don and the people
donating computers.
 

  
 

I have 2 suggestions that could help increase the amount of information
gathered from all the CPU hours being used to play these games. The first
suggestion is about changing the algorithm used to pair players together, and
the second is about “seeding” players to an appropriate starting
position.
 

  
 

Suppose we have two players A and B who each have about a 50% chance of
winning when they play each other. We expect to gain about 0.5*log2(0.5) +
0.5*log2(0.5) = 1 bit of information from every game they play. On the other
hand, if player A has a 95% chance of beating player B we get about
0.95*log2(0.95) + 0.05*log2(0.05) = about 0.29 bits of information when they
play each other. Therefore, the more often you are playing nearly equal players,
the more information is gained.
 

  
 

Now suppose we have a list of players ranked 0 to N, where the lower
the ranking, the stronger the player. Assume that player n takes twice as long
to play a game as player n+1 for any n. Also assume that the Elo difference
between player n and player n+1 is 80 points, and that the cost of player 0
playing a game is 1.0 time units. These assumptions seem reasonable for the
upper levels of the study if we only consider the Mogo programs up to Mogo_16
(the ones that have about 500 games). Using these assumptions we can build the
following table.
 

  
 

Rank    Cost                 WinProb          Bits                  Bits/Cost
 

0          2.0000             0.5000             1.0000             0.5000
 

1          1.5000             0.3869             0.9627             0.6418
 

2          1.2500             0.2847             0.8618             0.6895
 

3          1.1250             0.2008             0.7234             0.6431
 

4          1.0625             0.1368             0.5758             0.5419
 

5          1.0313             0.0909             0.4395             0.4262
 

6          1.0156             0.0594             0.3249             0.3199
 

7          1.0078             0.0383             0.2344             0.2326
 

8          1.0039             0.0245             0.1660             0.1654
 

9          1.0020             0.0156             0.1160             0.1157
 

10        1.0010             0.0099             0.0801             0.0801
 

  
 

For example, if player 0 plays player1 then the total cost is 1.5
because player 0 takes 1 unit of time and player 1 takes 0.5 (half of player 0
time) for a total cost of 1.5 units of time. The Elo difference is 80, so using
the formula P = 1 / (1 + 10^(80/400)) = 0.3869 we see that player 1 has about a
39% change of winning. The bits of information we expect to get are
0.3869*log2(0.3869) + (1 – 0.3869)*log2(1 – 0.3869) = 0.9627. The
Bits/Cost is 0.9627/1.5 = 0.6418. The table shows the results of player 0
playing all players ranked 0 to 10 ranks lower. If player 0 plays himself the
cost is 2.0, the probability of a win is 50%, resulting in 1 bit of information
and the Bits/Cost is 0.5.
 

  
 

From the table, to maximize the information gain per unit of time, the
best games are played between players that are 2 ranks apart. That has to be
balanced against having a wide variety of games to get robust rankings.
 

  
 

I believe the current method is to pick 3 random opponents and choose
the one closest to the player. If there are N players, then the top player
would expect his average opponent to be about N / (3 + 1) ranks down the list.
Since there are 34 players the top player will play the player 8.25 ranks down
on average. From the table, this would result in about 0.1654 bits per game. The
actual rate is higher than this because a range of opponents are played, some
closer than 8.25 ranks. Also notice that the larger the pools of players, the
further away in rank the top player’s average opponent will be.
 

  
 

I propose that instead of the current method, if players play an
opponent R ranks away where R follows a normal distribution with standard
deviation 3. If there is no player at the suggested rank, generate another rank
and try again. For example, the top player has no player ranked higher, so if a
positive R is generated, keep generating new ranks until a negative rank R is
generated. This method is not affected by the size of the player pool and also 
results
in the average opponent being much closer than 8.25 ranks away.
 

  
 

This should result in more information being gained from each game, but
still preserve the property that any player has a chance of playing any other
player, although if they’re not within about 2.5 standard deviations (3 *
2.5 = 7.5 ranks) then it becomes a less than 1% chance.
 

  
 

The second suggestion is that when some information is known about new
players being introduced into the ladder, they could be “seeded”
into an appropriate rank instead of having to play their way up the ladder. For
example, if we assume Mogo_16 is about 80 Elo points stronger than Mogo_15 then
we could pretend that Mogo_16 has 2 wins and 1 loss against Mogo_15 before any
actual games are played. This would seed Mogo_16 into about the right rank, and
it would immediately start playing high information games instead of games
against opponents that are too weak to yield much information.
 

  
 

Chuck Paulson
 

www.PuffinwareLLC.com
 

  
 






-----Inline Attachment Follows-----

_______________________________________________
computer-go 
mailing 
list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/






      
____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to