RE: [computer-go] Re: fuego strength

David Fotland Wed, 24 Jun 2009 09:15:21 -0700

I can have a reduced version of Many Faces up all the time on an old
computer, but I don't monitor it, so someone would have to email and remind
me when it goes down (usually because of a Microsoft automatic reboot :( )


David

> -----Original Message-----
> From: [email protected] [mailto:computer-go-
> [email protected]] On Behalf Of Magnus Persson
> Sent: Wednesday, June 24, 2009 5:55 AM
> To: computer-go; Don Dailey
> Subject: Re: [computer-go] Re: fuego strength
> 
> On 9x9 I have been worrying of the lack of strong anchors but not
> enough to complain about. What I think is more important is that
> stronger programs are actually active on CGOS for longer periods of
> time. I tried to contribute more by having versions of Valkyria online
> with a fixed number of playouts. The nice part of that is that I can
> then run other tests on the same machine that all uses fixed number of
> playouts and still get proper results. If I run a full strength
> version of Valkyria on CGOS I cannot have anything else running.
> 
> So, I think if more people with strong programs had reduced versions
> running, we could have many middle strength programs it would also
> become more meaningful to play with full strength programs.
> 
> I am looking forward to the new server because I think everyone
> would/should be eager to login to it.
> 
> Magnus
> 
> Quoting Don Dailey <[email protected]>:
> 
> > 2009/6/24 Christian Nentwich <[email protected]>
> >
> >>  Don,
> >>
> >> you might have your work cut out if you try to control inflation
> directly,
> >> that can turn into a black art very quickly. Multiple anchors would be
> >> preferable. An offline, X * 1000 game playoff between gnugo and another
> >> candidate anchor would be enough to fix their rating difference. If
> their
> >> bilateral winnings drift away during continuous play, the anchor rating
> >> could be tweaked.
> >>
> >
> > It's all a black art anyway.  The anchor itself absorbs (or gives away)
> > rating points into the pool.  There is not much difference if I just use
> it
> > to monitor the inflation/deflation and control it directly - except that
> I
> > have the ability to control the magnitude of this adjustment.   And the
> > advantage is that the anchor player becomes a monitor of the inflation
> > level.
> >
> > Don't worry, I don't plan to change it from what I'm doing.    The
anchor
> > can still monitor inflation if I track what adjustment I would normally
> make
> > to it if it were not an anchor.   For instance if the opponent
> adjustments
> > tended to be more negative than positive it would indicate that the
> entire
> > pool was overrated.   A way to help compensate is to adjust the initial
> > rating of new players.  However, the first game against a brand new
> player
> > is not rated for the established player and the K constant is so low
(for
> > the new players opponents) that it hardly matters.     Each player
starts
> > with a high K and it gradually drops to 3.   But this K is modified from
> 0%
> > to 100% depending on the opponents K - so the first game against a
player
> a
> > new player is effectively not rated for his opponent.    But I think the
> > initial value does have an impact on deflation/inflation of the entire
> pool.
> >
> >
> >
> >>
> >>
> >> I'm not sure if the worries voiced on this list about anchors are not
> >> somewhat overdone.
> >>
> >
> > I'm pretty sure it's overdone, but I reserve judgment.  I know the
> > phenomenon of self-play intransitivity exists,  but it's minor.   This
is
> > something that can easily be tested privately with a 100,000 games or so
> to
> > get the amount nailed down - at least for specific trio's of players.
> I
> > think I may try gnugo vs fuego at 2 different levels.
> >
> > I think that MCTS are all similar and that this is the bigger issue.
> And
> > as you say,  gnugo introduces bias too, it's unavoidable.
> >
> >
> >> Other bots, with improvements, may do just as well against an old
> version
> >> of Fuego as the full Fuego does, we don't know. Maybe they would do
> better
> >> than new versions of Fuego. All this reliance on gnugo introduces bias,
> too,
> >> and after all the anchor player is not a single control variable that
> >> determines the destiny of the server. Players will still play many
> different
> >> opponents. If Fuego keeps beating the anchor player but losing to
> everybody
> >> else, it still won't get a higher rank.
> >>
> >> For me, gnugo as an anchor is fine, as I am still experimenting around
a
> >> low ELO level. For authors of strong programs: I am quite surprised
that
> you
> >> are not insisting on a much more highly rated anchor. I remember when
> KGS
> >> was anchored in the kyu ranks, many years ago. I found myself 7 dan one
> day,
> >> until somebody intervened and reanchored the server. The territory far
> above
> >> a single anchor player is unsafe.
> >>
> >
> > The thought has occured to me that I should not worry about low resource
> > anchors and that I should simply bite the bullet and insist, as you say,
> on
> > much stronger anchor players.     But the tone of these discussions
> indicate
> > that few consider that very important.   I'm glad to hear that I am not
> the
> > only one. If I did do this it would not need to disrupt the pool - I
> would
> > still run the standard gnugo player that I currently use as an anchor
and
> > use it as a way to monitor the "new" anchor - at least for the first
> 100,000
> > games of the new anchor.
> >
> > I have no problem using programs under heavy development either.   What
> > people are missing is that I don't use the latest version,  I simply
pick
> a
> > good version and stick with that.   For instance I do not upgrade gnugo
-
> I
> > continue to use the same version I started with.     So the anchor is
not
> > continuously improving - it is a constant.
> >
> > - Don
> >
> >
> >
> >
> >>
> >>
> >> Christian
> >>
> >>
> >>
> >>
> >> On 24/06/2009 05:28, Don Dailey wrote:
> >>
> >> >From what I have discovered so far, there is no compelling reason to
> >> change anchors.   What I really was hoping we could do is UPGRADE the
> >> anchor, since many programs are now far stronger than 1800.
> >>
> >> Fuego is pretty strong, but not when it plays at the same CPU intensity
> as
> >> gnugo.   I went up to 5000 simulations and the match is fairly close
and
> the
> >> time is about the same.    Going from 3000 to 5000 was quite a
> remarkable
> >> jump in strength and no doubt we could run at 10,000 and have
> substantial
> >> superiority - but that's not really what I had in mind.
> >>
> >> So I think I agree with all the comments I have received so far - and
my
> >> own observations and testing, there is no compelling reasons to change.
> >>
> >> Now if fuego was substantially stronger using less resources, I would
be
> >> more eager to change after carefully calibrating the difference,  but
> that
> >> is not the case, at least not at 19x19.
> >>
> >> There is another way to keep ratings stable and that is to monitor key
> >> players over time and build a deflation/inflation mechanism into the
> server
> >> to keep it in tune.    For instance if there were no anchors,   the
> server
> >> could monitor gnugo and if it were to gradually drop in rating, I could
> make
> >> minor adjustments to the ratings of winners and losers to compensate
> >> gradually over time.  For example the winner could get 1% more ELO and
> the
> >> loser could lose 1% less ELO when in inflation mode and just the
> opposite
> >> when in deflation mode.   In this way I could feed points into the
> rating
> >> pool, or gradually extract them as needed.   I don't plan to do this,
> but
> >> there is more than one way to skin a cat.
> >>
> >> If we use more than one player as anchors,  I would still pick one
> player
> >> as the standard, and periodically adjust the "other" anchors based on
> their
> >> global perormance rating - since they will all tend to drift around
> relative
> >> to each other and I would not want to make any assumptions about what
> the
> >> other anchors should be.     We cannot just say gnugo is 1800, fuego is
> >> 2000, etc because we don't really know the exact difference between the
> 2.
> >> But we could refine this over time.
> >>
> >> - Don
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Jun 23, 2009 at 11:34 PM, David Fotland
> >> <[email protected]>wrote:
> >>
> >>> I'd also prefer to use gnugo as an anchor.  Since fuego is under
> >>> development, new versions will be playing with an odler version of
> itself.
> >>> Fuego will win more often against its old version.  I don't care about
> it
> >>> distorting Fuego's rating, but it will distort the rating system.  If
> new
> >>> fuego is on with few other programs it will gain rating points, then
> when
> >>> other programs come new fuego will give them the other program as its
> >>> rating
> >>> drops.  The effect will be to make the rating system less stable, so
> it's
> >>> hard to use cgos to evaluate new versions of programs to see if they
> are
> >>> stronger.
> >>>
> >>> I think it's best to use an anchor that's not under active
development.
> I
> >>> like gnugo since there is lots of published results against it, and it
> is
> >>> not changing rapidly.  Also it has a different style than the monte
> carlo
> >>> programs, so it's more likely to expose bugs in the monte carlo
> programs.
> >>>
> >>> David
> >>>
> >>> > -----Original Message-----
> >>> > From: [email protected] [mailto:computer-go-
> >>> > [email protected]] On Behalf Of Hideki Kato
> >>> > Sent: Tuesday, June 23, 2009 5:15 PM
> >>> > To: computer-go
> >>> > Subject: [computer-go] Re: fuego strength
> >>> >
> >>> > I'm running Fatman1, GNU Go and GNU Go MC version for 9x9 and two
> >>> > instances of GNU Go for 13x13, five programs in total, on a
dual-core
> >>> > Athlon at home.
> >>> >
> >>> > I strongly believe current anchors are resource friendly enough for
> >>> > older pentium 3, 4 or even Celeron processors and not necessary
being
> >>> > changed.
> >>> >
> >>> > Changing anchors is a big problem, similar to changing the
> >>> > International prototypes.  Also, GNU Go is used as a reference in
> >>> > almost every computer-go research these days.
> >>> >
> >>> > I'm against that idea, especially for 19x19.
> >>> >
> >>> > Hideki
> >>> >
> >>> > Don Dailey: <
> >>> [email protected]>:
> >>> > >I'm trying now to get a rough idea about the strength of fuego and
> it's
> >>> > >suitablity as the anchor player.
> >>> > >
> >>> > >Right now the numbers are very rough as I need more samples.   I'm
> >>> > currently
> >>> > >looking at:
> >>> > >
> >>> > >  1.  9x9 fuego at 1000 simulations
> >>> > >
> >>> > >  2. 19x19 fuego at 3000 simulations.
> >>> > >
> >>> > >
> >>> > >I'm testing against the current CGOS anchors,  so FatMan vs fuego
at
> >>> 9x9
> >>> > and
> >>> > >gnugo-3.7.10 at 19x19.
> >>> > >
> >>> > >
> >>> > >At 9x9 fuego appears to be substantially stronger than FatMan,
> perhaps
> >>> > >100-200 ELO.   It also is far faster at 1000 simulation than fatman
> >>> which
> >>> > >requires many more simulations to reach anchor strength.   So there
> is
> >>> no
> >>> > >questions about fuego being a capable anchor for small boards.  At
> this
> >>> > >level on 9x9 FatMan is also stronger than gnugo, so fuego is far
> >>> stronger
> >>> > >than gnugo on 9x9 and is very resource friendly too.
> >>> > >
> >>> > >At 19x19 the story is a bit different.  gnugo appears to be
> >>> significantly
> >>> > >stronger, but about twice as slow.   There is not enough data to
> narrow
> >>> > this
> >>> > >down much, but it appears to be over 200 ELO weaker at this level.
> >>> > >
> >>> > >Since fuego is using only about half the CPU resources of gnugo,  I
> can
> >>> > >increase the level.    I've only played 30 games at 19x19, so this
> >>> > >conclusion is subject to signficant error, but it's enough to
> conclude
> >>> > that
> >>> > >it's almost certainly weaker at this level but perhaps not when run
> at
> >>> the
> >>> > >same CPU intensity as gnugo.
> >>> > >
> >>> > >Of course at higher levels yet, fuego would be far stronger than
> >>> > >gnugo-3.7.10 as seen in the 19x19 cgos tables.   But I'm hoping not
> to
> >>> > push
> >>> > >the anchors too hard - hopefully they can be run on someones older
> >>> spare
> >>> > >computer or set unobtrusively in the background on someones desktop
> >>> > >machine.
> >>> > >
> >>> > >
> >>> > >- Don
> >>> > >---- inline file
> >>> > >_______________________________________________
> >>> > >computer-go mailing list
> >>> > >[email protected]
> >>> > >http://www.computer-go.org/mailman/listinfo/computer-go/
> >>> > --
> >>> > [email protected] (Kato)
> >>> > _______________________________________________
> >>> > computer-go mailing list
> >>> > [email protected]
> >>> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >>>
> >>> _______________________________________________
> >>> computer-go mailing list
> >>> [email protected]
> >>> http://www.computer-go.org/mailman/listinfo/computer-go/
> >>>
> >>
> >> ------------------------------
> >>
> >> _______________________________________________
> >> computer-go mailing
> >> [email protected]http://www.computer-
> go.org/mailman/listinfo/computer-go/
> >>
> >>
> >>
> >> _______________________________________________
> >> computer-go mailing list
> >> [email protected]
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> >>
> >
> 
> 
> 
> --
> Magnus Persson
> Berlin, Germany
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Re: fuego strength

Reply via email to