Good point, Roel. Perhaps in the final layers one could make it predict a
model of the expected score distribution (before combining with the komi
and other rules specific adjustments for handicap stones, pass stones,
last-play parity, etc.). Should be easy enough to back-propagate win/loss
information (and perhaps even more) through such a model.


On Thu, Oct 26, 2017 at 3:55 PM, Roel van Engelen <gosuba...@gmail.com>
wrote:

> @Gian-Carlo Pascutto
>
> Since training uses a ridiculous amount of computing power i wonder if it
> would
> be useful to make certain changes for future research, like training the
> value head
> with multiple komi values <https://arxiv.org/pdf/1705.10701.pdf>
>
> On 26 October 2017 at 03:02, Brian Sheppard via Computer-go <
> computer-go@computer-go.org> wrote:
>
>> I think it uses the champion network. That is, the training periodically
>> generates a candidate, and there is a playoff against the current champion.
>> If the candidate wins by more than 55% then a new champion is declared.
>>
>>
>>
>> Keeping a champion is an important mechanism, I believe. That creates the
>> competitive coevolution dynamic, where the network is evolving to learn how
>> to beat the best, and not just most recent. Without that dynamic, the
>> training process can go up and down.
>>
>>
>>
>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
>> Behalf Of *uurtamo .
>> *Sent:* Wednesday, October 25, 2017 6:07 PM
>> *To:* computer-go <computer-go@computer-go.org>
>> *Subject:* Re: [Computer-go] Source code (Was: Reducing network size?
>> (Was: AlphaGo Zero))
>>
>>
>>
>> Does the self-play step use the most recent network for each move?
>>
>>
>>
>> On Oct 25, 2017 2:23 PM, "Gian-Carlo Pascutto" <g...@sjeng.org> wrote:
>>
>> On 25-10-17 17:57, Xavier Combelle wrote:
>> > Is there some way to distribute learning of a neural network ?
>>
>> Learning as in training the DCNN, not really unless there are high
>> bandwidth links between the machines (AFAIK - unless the state of the
>> art changed?).
>>
>> Learning as in generating self-play games: yes. Especially if you update
>> the network only every 25 000 games.
>>
>> My understanding is that this task is much more bottlenecked on game
>> generation than on DCNN training, until you get quite a bit of machines
>> that generate games.
>>
>> --
>> GCP
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to