One tricky thing is that there are some major nonlinearities between
different bots early in the opening that break Elo model assumptions quite
blatantly at these higher levels.

The most noticeable case of this is with Mi Yuting's flying dagger joseki.
I've noticed for example that in particular matchups between different
pairs of bots (e.g. one particular KataGo net as white versus ELF as black,
or one version of LZ as black versus some other version as white), maybe as
many as 30% of games will enter into this joseki and the preferences for
the bots may happen by chance to line up such that consistently they will
play down a path where one side hits a blind spot and begins the game with
an early disadvantage. Each different bot may have different preferences
such that arbitrarily each possible pairing randomly runs into such a trap
or not.

And, having significant early-game temperature in the bot itself doesn't
always help as much as you would think because this particular joseki is so
sharp that a particular bot could easily have such a strong preference for
one path or another (even when it is ultimately wrong) so as to override
any reasonable temperature. Sometimes, adding temperature or extra
randomness simply only mildly changes the frequency of the sequence, or
just varies the time before the joseki and trap/blunder happens anyways.

If games are to begin from the empty board, I'm not sure there's an easy
way around this except having a very large variety of opponents.

One thing that I'm pretty sure would mostly "fix" the problem (in the sense
of producing a smoother metric of general strength in a variety of
positions not heavily affected by just a few key lines) would be to
semi-arbitrarily take a very large sampling of positions from a wide range
of human professional games, from say, move 20, and have bots play starting
from these sampled positions, in pairs once with each color. This would
still include many AI openings, because of the way human pros in the last
3-4 years have quickly integrated and experimented with them, but would
also introduce a lot more variety in general than would occur in any
head-to-head matchup.

This is almost surely a *smaller *problem than simply having enough games
mixing between different long-running bots to anchor the Elo system. And it
is not the only way major nontransitivities can show up, (e.g. ladders).
But to take a leaf from computer Chess, playing from sampled forced
openings seems to be a common practice there and maybe it's worth
considering in computer Go as well, even if it only fixes what is currently
the smaller of the issues.


On Thu, Jan 21, 2021 at 12:01 PM Rémi Coulom <remi.cou...@gmail.com> wrote:

> Thanks for computing the new rating list.
>
> I feel it did not fix anything. The old Zen, cronus, etc.have almost no
> change at all.
>
> So it is not a good fix, in my opinion. No need to change anything to the
> official ratings.
>
> The fundamental problem seems that the Elo rating model is too wrong for
> this data, and there is no easy fix for that.
>
> Long ago, I had thought about using a more complex multi-dimensional Elo
> model. The CGOS data may be a good opportunity to try it. I will try when I
> have some free time.
>
> Rémi
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to