On Fri, Oct 20, 2017, 21:48 Petr Baudis <pa...@ucw.cz> wrote:

>   Few open questions I currently have, comments welcome:
>
>   - there is no input representing the number of captures; is this
>     information somehow implicit or can the learned winrate predictor
>     never truly approximate the true values because of this?
>

They are using Chinese rules, so prisoners don't matter. There are simply
less stones of one color on the board.


>   - what ballpark values for c_{puct} are reasonable?
>

The original paper has the value they used. But this likely needs tuning. I
would tune with a supervised network to get started, but you need games for
that. Does it even matter much early on? The network is random :)


>   - why is the dirichlet noise applied only at the root node, if it's
>     useful?
>

It's only used to get some randomness in the move selection, no ? It's not
actually useful for anything besides that.


>   - the training process is quite lazy - it's not like the network sees
>     each game immediately and adjusts, it looks at last 500k games and
>     samples 1000*2048 positions, meaning about 4 positions per game (if
>     I understood this right) - I wonder what would happen if we trained
>     it more aggressively, and what AlphaGo does during the initial 500k
>     games; currently, I'm training on all positions immediately, I guess
>     I should at least shuffle them ;)
>

I think the lazyness may be related to the concern that reinforcement
methods can easily "forget" things they had learned before. The value
network training also likes positions from distinct games.


-- 

GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to