I can have a reduced version of Many Faces up all the time on an old computer, but I don't monitor it, so someone would have to email and remind me when it goes down (usually because of a Microsoft automatic reboot :( )
David > -----Original Message----- > From: [email protected] [mailto:computer-go- > [email protected]] On Behalf Of Magnus Persson > Sent: Wednesday, June 24, 2009 5:55 AM > To: computer-go; Don Dailey > Subject: Re: [computer-go] Re: fuego strength > > On 9x9 I have been worrying of the lack of strong anchors but not > enough to complain about. What I think is more important is that > stronger programs are actually active on CGOS for longer periods of > time. I tried to contribute more by having versions of Valkyria online > with a fixed number of playouts. The nice part of that is that I can > then run other tests on the same machine that all uses fixed number of > playouts and still get proper results. If I run a full strength > version of Valkyria on CGOS I cannot have anything else running. > > So, I think if more people with strong programs had reduced versions > running, we could have many middle strength programs it would also > become more meaningful to play with full strength programs. > > I am looking forward to the new server because I think everyone > would/should be eager to login to it. > > Magnus > > Quoting Don Dailey <[email protected]>: > > > 2009/6/24 Christian Nentwich <[email protected]> > > > >> Don, > >> > >> you might have your work cut out if you try to control inflation > directly, > >> that can turn into a black art very quickly. Multiple anchors would be > >> preferable. An offline, X * 1000 game playoff between gnugo and another > >> candidate anchor would be enough to fix their rating difference. If > their > >> bilateral winnings drift away during continuous play, the anchor rating > >> could be tweaked. > >> > > > > It's all a black art anyway. The anchor itself absorbs (or gives away) > > rating points into the pool. There is not much difference if I just use > it > > to monitor the inflation/deflation and control it directly - except that > I > > have the ability to control the magnitude of this adjustment. And the > > advantage is that the anchor player becomes a monitor of the inflation > > level. > > > > Don't worry, I don't plan to change it from what I'm doing. The anchor > > can still monitor inflation if I track what adjustment I would normally > make > > to it if it were not an anchor. For instance if the opponent > adjustments > > tended to be more negative than positive it would indicate that the > entire > > pool was overrated. A way to help compensate is to adjust the initial > > rating of new players. However, the first game against a brand new > player > > is not rated for the established player and the K constant is so low (for > > the new players opponents) that it hardly matters. Each player starts > > with a high K and it gradually drops to 3. But this K is modified from > 0% > > to 100% depending on the opponents K - so the first game against a player > a > > new player is effectively not rated for his opponent. But I think the > > initial value does have an impact on deflation/inflation of the entire > pool. > > > > > > > >> > >> > >> I'm not sure if the worries voiced on this list about anchors are not > >> somewhat overdone. > >> > > > > I'm pretty sure it's overdone, but I reserve judgment. I know the > > phenomenon of self-play intransitivity exists, but it's minor. This is > > something that can easily be tested privately with a 100,000 games or so > to > > get the amount nailed down - at least for specific trio's of players. > I > > think I may try gnugo vs fuego at 2 different levels. > > > > I think that MCTS are all similar and that this is the bigger issue. > And > > as you say, gnugo introduces bias too, it's unavoidable. > > > > > >> Other bots, with improvements, may do just as well against an old > version > >> of Fuego as the full Fuego does, we don't know. Maybe they would do > better > >> than new versions of Fuego. All this reliance on gnugo introduces bias, > too, > >> and after all the anchor player is not a single control variable that > >> determines the destiny of the server. Players will still play many > different > >> opponents. If Fuego keeps beating the anchor player but losing to > everybody > >> else, it still won't get a higher rank. > >> > >> For me, gnugo as an anchor is fine, as I am still experimenting around a > >> low ELO level. For authors of strong programs: I am quite surprised that > you > >> are not insisting on a much more highly rated anchor. I remember when > KGS > >> was anchored in the kyu ranks, many years ago. I found myself 7 dan one > day, > >> until somebody intervened and reanchored the server. The territory far > above > >> a single anchor player is unsafe. > >> > > > > The thought has occured to me that I should not worry about low resource > > anchors and that I should simply bite the bullet and insist, as you say, > on > > much stronger anchor players. But the tone of these discussions > indicate > > that few consider that very important. I'm glad to hear that I am not > the > > only one. If I did do this it would not need to disrupt the pool - I > would > > still run the standard gnugo player that I currently use as an anchor and > > use it as a way to monitor the "new" anchor - at least for the first > 100,000 > > games of the new anchor. > > > > I have no problem using programs under heavy development either. What > > people are missing is that I don't use the latest version, I simply pick > a > > good version and stick with that. For instance I do not upgrade gnugo - > I > > continue to use the same version I started with. So the anchor is not > > continuously improving - it is a constant. > > > > - Don > > > > > > > > > >> > >> > >> Christian > >> > >> > >> > >> > >> On 24/06/2009 05:28, Don Dailey wrote: > >> > >> >From what I have discovered so far, there is no compelling reason to > >> change anchors. What I really was hoping we could do is UPGRADE the > >> anchor, since many programs are now far stronger than 1800. > >> > >> Fuego is pretty strong, but not when it plays at the same CPU intensity > as > >> gnugo. I went up to 5000 simulations and the match is fairly close and > the > >> time is about the same. Going from 3000 to 5000 was quite a > remarkable > >> jump in strength and no doubt we could run at 10,000 and have > substantial > >> superiority - but that's not really what I had in mind. > >> > >> So I think I agree with all the comments I have received so far - and my > >> own observations and testing, there is no compelling reasons to change. > >> > >> Now if fuego was substantially stronger using less resources, I would be > >> more eager to change after carefully calibrating the difference, but > that > >> is not the case, at least not at 19x19. > >> > >> There is another way to keep ratings stable and that is to monitor key > >> players over time and build a deflation/inflation mechanism into the > server > >> to keep it in tune. For instance if there were no anchors, the > server > >> could monitor gnugo and if it were to gradually drop in rating, I could > make > >> minor adjustments to the ratings of winners and losers to compensate > >> gradually over time. For example the winner could get 1% more ELO and > the > >> loser could lose 1% less ELO when in inflation mode and just the > opposite > >> when in deflation mode. In this way I could feed points into the > rating > >> pool, or gradually extract them as needed. I don't plan to do this, > but > >> there is more than one way to skin a cat. > >> > >> If we use more than one player as anchors, I would still pick one > player > >> as the standard, and periodically adjust the "other" anchors based on > their > >> global perormance rating - since they will all tend to drift around > relative > >> to each other and I would not want to make any assumptions about what > the > >> other anchors should be. We cannot just say gnugo is 1800, fuego is > >> 2000, etc because we don't really know the exact difference between the > 2. > >> But we could refine this over time. > >> > >> - Don > >> > >> > >> > >> > >> > >> On Tue, Jun 23, 2009 at 11:34 PM, David Fotland > >> <[email protected]>wrote: > >> > >>> I'd also prefer to use gnugo as an anchor. Since fuego is under > >>> development, new versions will be playing with an odler version of > itself. > >>> Fuego will win more often against its old version. I don't care about > it > >>> distorting Fuego's rating, but it will distort the rating system. If > new > >>> fuego is on with few other programs it will gain rating points, then > when > >>> other programs come new fuego will give them the other program as its > >>> rating > >>> drops. The effect will be to make the rating system less stable, so > it's > >>> hard to use cgos to evaluate new versions of programs to see if they > are > >>> stronger. > >>> > >>> I think it's best to use an anchor that's not under active development. > I > >>> like gnugo since there is lots of published results against it, and it > is > >>> not changing rapidly. Also it has a different style than the monte > carlo > >>> programs, so it's more likely to expose bugs in the monte carlo > programs. > >>> > >>> David > >>> > >>> > -----Original Message----- > >>> > From: [email protected] [mailto:computer-go- > >>> > [email protected]] On Behalf Of Hideki Kato > >>> > Sent: Tuesday, June 23, 2009 5:15 PM > >>> > To: computer-go > >>> > Subject: [computer-go] Re: fuego strength > >>> > > >>> > I'm running Fatman1, GNU Go and GNU Go MC version for 9x9 and two > >>> > instances of GNU Go for 13x13, five programs in total, on a dual-core > >>> > Athlon at home. > >>> > > >>> > I strongly believe current anchors are resource friendly enough for > >>> > older pentium 3, 4 or even Celeron processors and not necessary being > >>> > changed. > >>> > > >>> > Changing anchors is a big problem, similar to changing the > >>> > International prototypes. Also, GNU Go is used as a reference in > >>> > almost every computer-go research these days. > >>> > > >>> > I'm against that idea, especially for 19x19. > >>> > > >>> > Hideki > >>> > > >>> > Don Dailey: < > >>> [email protected]>: > >>> > >I'm trying now to get a rough idea about the strength of fuego and > it's > >>> > >suitablity as the anchor player. > >>> > > > >>> > >Right now the numbers are very rough as I need more samples. I'm > >>> > currently > >>> > >looking at: > >>> > > > >>> > > 1. 9x9 fuego at 1000 simulations > >>> > > > >>> > > 2. 19x19 fuego at 3000 simulations. > >>> > > > >>> > > > >>> > >I'm testing against the current CGOS anchors, so FatMan vs fuego at > >>> 9x9 > >>> > and > >>> > >gnugo-3.7.10 at 19x19. > >>> > > > >>> > > > >>> > >At 9x9 fuego appears to be substantially stronger than FatMan, > perhaps > >>> > >100-200 ELO. It also is far faster at 1000 simulation than fatman > >>> which > >>> > >requires many more simulations to reach anchor strength. So there > is > >>> no > >>> > >questions about fuego being a capable anchor for small boards. At > this > >>> > >level on 9x9 FatMan is also stronger than gnugo, so fuego is far > >>> stronger > >>> > >than gnugo on 9x9 and is very resource friendly too. > >>> > > > >>> > >At 19x19 the story is a bit different. gnugo appears to be > >>> significantly > >>> > >stronger, but about twice as slow. There is not enough data to > narrow > >>> > this > >>> > >down much, but it appears to be over 200 ELO weaker at this level. > >>> > > > >>> > >Since fuego is using only about half the CPU resources of gnugo, I > can > >>> > >increase the level. I've only played 30 games at 19x19, so this > >>> > >conclusion is subject to signficant error, but it's enough to > conclude > >>> > that > >>> > >it's almost certainly weaker at this level but perhaps not when run > at > >>> the > >>> > >same CPU intensity as gnugo. > >>> > > > >>> > >Of course at higher levels yet, fuego would be far stronger than > >>> > >gnugo-3.7.10 as seen in the 19x19 cgos tables. But I'm hoping not > to > >>> > push > >>> > >the anchors too hard - hopefully they can be run on someones older > >>> spare > >>> > >computer or set unobtrusively in the background on someones desktop > >>> > >machine. > >>> > > > >>> > > > >>> > >- Don > >>> > >---- inline file > >>> > >_______________________________________________ > >>> > >computer-go mailing list > >>> > >[email protected] > >>> > >http://www.computer-go.org/mailman/listinfo/computer-go/ > >>> > -- > >>> > [email protected] (Kato) > >>> > _______________________________________________ > >>> > computer-go mailing list > >>> > [email protected] > >>> > http://www.computer-go.org/mailman/listinfo/computer-go/ > >>> > >>> _______________________________________________ > >>> computer-go mailing list > >>> [email protected] > >>> http://www.computer-go.org/mailman/listinfo/computer-go/ > >>> > >> > >> ------------------------------ > >> > >> _______________________________________________ > >> computer-go mailing > >> [email protected]http://www.computer- > go.org/mailman/listinfo/computer-go/ > >> > >> > >> > >> _______________________________________________ > >> computer-go mailing list > >> [email protected] > >> http://www.computer-go.org/mailman/listinfo/computer-go/ > >> > > > > > > -- > Magnus Persson > Berlin, Germany > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
