I suspect the reason they were able to reasonably train a value net with
multiple komi at the same time was because the training games they used in
that paper were generated by a pure policy net, rather than by a MCTS
player, where the policy net was trained from human games.

Although humans give up points for safety when ahead, in practice it seems
like they do so less than MCTS players of the same strength, so the policy
net trained on human games would not be expected to be as strongly feature
that tendency as it would if it were MCTS games, leading to less of a bias
when adjusting the komi. Plus it might be somewhat hard for a pure policy
net to learn to evaluate the board to, say, within +/- 3 points during the
macro and micro endgame to determine when it should predict moves to become
more conservative, if the policy net was never directly trained to
simultaneously predict the value. Particularly if the data set included
many 0.5 komi games too and the policy net was not told the komi. So one
might guess that the pure policy net would less tend to give up points for
safety, even less than the human games it was trained on.

All of this might help make it so that the data set they used for training
the value net could reasonably be used without introducing too much bias
when rescoring the same games with different komi .



On Thu, Oct 26, 2017 at 6:33 PM, Shawn Ligocki <sligo...@gmail.com> wrote:

> On Thu, Oct 26, 2017 at 2:02 PM, Gian-Carlo Pascutto <g...@sjeng.org>
> wrote:
>
>> On 26-10-17 15:55, Roel van Engelen wrote:
>> > @Gian-Carlo Pascutto
>> >
>> > Since training uses a ridiculous amount of computing power i wonder
>> > if it would be useful to make certain changes for future research,
>> > like training the value head with multiple komi values
>> > <https://arxiv.org/pdf/1705.10701.pdf>
>>
>> Given that the game data will be available, it will be trivial for
>> anyone to train a different network architecture on the result and see
>> if they get better results, or a program that handles multiple komi
>> values, etc.
>>
>> The problem is getting the *data*, not the training.
>>
>
> But the data should be different for different komi values, right?
> Iteratively producing self-play games and training with the goal of
> optimizing for komi 7 should converge to a different optimal player than
> optimizing for komi 5. But maybe having high quality data for komi 7 will
> still save a lot of the work for training a komi 5 (or komi agnostic)
> network?
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to