Why can't you reuse the same self played games but score them with a
different komi value ? The policy network does not use the komi to choose
its moves so it should make no difference.


> On 21/03/2017 21:08, David Ongaro wrote:
>>> But how would you fix it? Isn't that you'd need to retrain your value
>>> network from the scratch?
>>
>> I would think so as well. But I some months ago I already made a
>> proposal in this list to mitigate that problem: instead of training a
>> different value network for each Komi, add a “Komi adjustment” value
>> as
>> input during the training phase. That should be much more effective,
>> since the “win/lost” evaluation shouldn’t change for many (most?)
>> positions for small adjustments but the resulting value network (when
>> trained for different Komi adjustments) has a much greater range of
>> applicability.
>
> The problem is not the training of the network itself (~2-4 weeks of
> letting a program someone else wrote run in the background, easiest
> thing ever in computer go), or whether you use a komi input or a
> separate network, the problem is getting data for the different komi
> values.
>
> Note that if getting data is not a problem, then a separate network
> would perform better than your proposal.
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to