Hi Jacques

> I got a lot of improvement from Rémi's Bradley-Terry
> ideas in move prediction (although with some
> overlearning which I didn't care much about as
> predicting moves is not my interest.) But neither
> the naif values (times played/times seen) nor the
> improved Bradley-Terry values are better in playouts
> than uniform random. They are 158 CI(114..202) Elo
> points worse!

So it seems this is not just me. I kinda expecting a an
almost-guaranteed increase in strength.

> a. Use small patterns (3x3) with all non-ill-formed
> patterns in the database. (Other databases have a value
> for "unknown" this one shouldn't.)

You mean all the 3x3 patterns? I'm only using 3x3 patterns that occur
a number of times in my training collection.

> b. Classify patterns. I have done that in 40 classes.
> This way you reduce the amount of degrees of liberty.
> So your vector of gamma values is in R^40

I'm not following you here. What sort of classes? Degrees of liberty?
R^40? Would you mind explaining a bit for me?

Your genetic algo looks interesting, but I have a feeling I get get a
better return for my time working on other things at the moment.
--
Francois van Niekerk
Email: [email protected] | Twitter: @francoisvn
Cell: +2784 0350 214 | Website: http://leafcloud.com



On Thu, Dec 30, 2010 at 11:57 PM, Jacques Basaldúa <[email protected]> wrote:
> Hi Francois, Welcome
>
>
>> For reference I need about 100k playouts with
>> RAVE to get 50% winrate against GnuGo 3.8 L10.
>
> Yes that's more or less expected. At least before the
> "big" improvements (yet to come ;-)
>
> In my case I do a lot of testing at 4x10000 because
> the games are around 15 seconds long and I get fast
> Elo confidence intervals. At that rate 40 K plyo/move
> I get about 40-42% of wins against gnugo. This is more
> or less consistent with a debugged barebones without
> particular smarts (but with RAVE, without progressive
> widening). I guess 40% at 40000 scales to 50% near
> 100K but the exact point where I reach 50% has not
> been studied as I expect it to be much lower in the
> near future. (Optimistic)
>
>> The next step is obviously to apply these to the
>> playouts. I am currently testing my program with
>> the ELO features in the playouts, but unfortunately
>> the preliminary results don't look good.
>
> That's exactly my experience! Although you do get
> improvement with extend from atari/capture/distance to
> prev heuristic.
>
> The Mogo and CrazyStone papers report improvement
> "all features included" which is true because the
> other ideas produce improvement, but they don't
> give results for the patterns in isolation.
>
> I got a lot of improvement from Rémi's Bradley-Terry
> ideas in move prediction (although with some
> overlearning which I didn't care much about as
> predicting moves is not my interest.) But neither
> the naif values (times played/times seen) nor the
> improved Bradley-Terry values are better in playouts
> than uniform random. They are 158 CI(114..202) Elo
> points worse!
>
> That is good and bad news. Why should uniform random
> be the best?. Obviously it is not. But what humans
> play lacks all the information about what they don't
> play because it is obvious to them, but it is not
> obvious to a "silly" playout policy.
>
> How to find good values for the patterns? (What I have
> tried.)
>
> a. Use small patterns (3x3) with all non-ill-formed
> patterns in the database. (Other databases have a value
> for "unknown" this one shouldn't.)
>
> b. Classify patterns. I have done that in 40 classes.
> This way you reduce the amount of degrees of liberty.
> So your vector of gamma values is in R^40
>
> c. Then what? I really don't know. I have a "sort of
> genetic algorithm". I like the idea that anything
> changes at random, because the gammas are not
> independent and this way the expected value of the
> correlation is zero even under stochastic dependence.
> Then I select the "best winners" and move my center of
> gravity one little step in one or two classes of patterns
> repeat the entire process. Then test to see if there
> was improvement. A long process. I only won a little
> in the first iterations. After tat fake improvement
> that wasn't verified against uniform random.
>
> In all about 100 Elo points, less improvement than
> the "humans patterns" do wrong. I guess best playouts
> are a research area where there is room for improvement.
>
>
> Jacques.
>
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to