Why does this pose a problem? Presumably the monte carlo evaluator
will give the same position a similar score assuming it has enough
time. This would just cause a duplicate training pattern, or two
training patterns with identical input and slightly different output.
I guess I don't quite
I think there is something to this; it seems like it should be
possible to use a database of randomly selected positions from games
along with the best known followup, and use that as a faster way of
testing a program's strength than playing full games. Such a database
would be valuable for all
On 17, May 2007, at 8:17 AM, Brian Slesinsky wrote:
A weakness of this approach is that sometimes the best move depends on
how you plan to follow it up; a program that plays the theoretically
best move but doesn't know how to follow it up is weaker than a
program that plays safer moves.
I
What you would have after your training/evaluator phase is a hueristic
knowlege of possibly better montecarlo trees to consider. This will
definitely cut down on the search space, but could also alienate a strong
search path. I have been thinking along these same line for some time. The
On Thu, 2007-05-17 at 12:17 -0400, George Dahl wrote:
Imagine if you had a monte carlo program that took almost no time to
run. You would use it to do heavy playouts for another monte carlo
program to make it even stronger.
I tried something like this as a test with simple monte carlo. I
I find Monte-Carlo Go a fascinating avenue of research, but what pains
me is that a huge number of simulations are performed each game and at
the end of the game the results are thrown out. So what I was
thinking is that perhaps the knowledge generated by the simulations
could be collapsed in