Re: [Bug-gnubg] "Joseph-ID" in benchmark db

Mark Higgins Sun, 12 Feb 2012 08:13:37 -0800

My best player (TD trained, race & contact networks, a couple extra inputs 
beyond the standard Tesauro ones) has an average error of 0.0164ppg/move in the 
contact set, so not surprisingly worse than GNUbg (I assume 1125 means 
0.01125ppg/move?).

I also was curious which benchmark set was most relevant for predicting match 
score, since of course a real game is a mixture of the positions. I took a 
bunch of my players, of varying skills, and calculated the average error rate 
for the three benchmark sets; and also played each against PubEval for 40k 
cubeless money games. Then I regressed the score in those games against the 
benchmark ERs to see which was most important (using R^2 as a proxy for 
importance).

Turns out the contact benchmark is most relevant, followed by crashed. Race is 
not that important.

Details here:

http://compgammon.blogspot.com/2012/02/gnubg-benchmark-results.html

On Feb 12, 2012, at 8:51 AM, Øystein Schønning-Johansen wrote:

> I've looped through all 'm'-positionsThe following way:
> 
> For each postion I find if the best move with my evaluator, and find if my 
> move is among the candidates in the list of moves. If it does not make the 
> best move, I add the error to the total. If my evaluators move is not among 
> the candidaes at all, I assign the same error as the worst move among the 
> candidates. 
> 
> For all positions in contact.bm, GNU backgammon will have an error of about 
> 1125, (IIRC)
> 
> Please report how your players make it.
> 
> -Øystein
> 
> 
> 2012/2/12 Mark Higgins <[email protected]>
> Does anyone have the average error stats for 0-ply gnubg on the contact 
> benchmarks?
> 
> I see race & crashed results at Joseph's page here:
> 
> http://homepages.ihug.co.nz/~peps/ngb/index-top.html
> 
> but can't find the contact result anywhere. (Though I'd guess it's pretty 
> close to the crashed error, ie around 0.01ppg/move.)
> 
> 
> 
> 
> On Feb 10, 2012, at 6:26 AM, Øystein Schønning-Johansen wrote:
> 
>> 'r' is the seed used for the rollout, I think
>> 
>> Sound likely, since there is an 'r'-line for every rollout result. But there 
>> is no code lines in perr.py to conferm it. I guess we can trust your memory 
>> on that.
>> 
>> 'o' is the cube rollout, and the numbers are the rollout values of the 
>> outcome probability,
>> 
>> -Øystein
> 
>

_______________________________________________
Bug-gnubg mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-gnubg

Re: [Bug-gnubg] "Joseph-ID" in benchmark db

Reply via email to