Re: [Computer-go] A Linear Classifier Outperforms UCT on 9x9 Go

Imran Hendley Tue, 28 Jun 2011 21:40:02 -0700

Hi, long-time lurker and occasional poster here,

Thank you for the paper. I hope you don't mind me asking a few very basic
questions, since I am having trouble understanding exactly what you are
doing.

Let's say we are using a linear classifier. Then our output (the predicted
move) should look like:

argmax_i (*y*[i]), where *y*[i] = *w1*[i] · m1 + *w2*[i] · m2 + b

Where each w[i] is a weight vector for location i on the board, the m's are
the (column) input vectors (which I assume are 1 at the move location and
zero elsewhere), and b is the bias term.

To train our classifier online, we want to do something like: (1) Generate a
prediction for a training example. (2) Calculate the error. (3) Update the
feature weights. (4) Repeat.

If I understand, online training happens during the course of one game, as
we are playing. Moreover, we are using our classifier to generate moves to
select in the first phase of our simulation, as a replacement for MCTS, and
before playouts.

Now this is where I have to start guessing the details. Are our training
examples playouts, and is our error function just 0 if the playout wins, and
1 if it loses? And as we run more playouts, the classifier will update its
weights and select a different sequence of moves in the first phase of our
simulation (analogous to selecting different paths down the search tree
based on node scores in MCTS)? And when we use up our allotted time for one
turn we just return the next move (from the current position) that our
classifier predicts, based on its current weights?

The paper says we fix the number of moves we select with the classifier
before running playouts (unlike starting from the root and expanding in
MCTS). This is where things start getting really fuzzy for me. Do we
propagate the results of a playout back up this sequence? i.e. if we get a
win, do we perform updates of our classifier for each two-move sequence in
the full sequence?

I would really like to get to the deeper questions about interpreting what
is really going on, but I first need to make sure I am on the right page
here. Sincere apologies for the stupid questions. I really hope my
understanding didn't get derailed so early on that most of my questions in
this message are gibberish. But I did want to show that I actually made a
concerted effort to understand the paper before asking what on earth it is
all about!

On Tue, Jun 28, 2011 at 7:30 PM, Peter Drake <[email protected]> wrote:

> It doesn't beat RAVE, but it's an interesting result. Our paper will appear
> at the International Conference on Artificial Intelligence (ICAI) in Las
> Vegas:
>
> https://webdisk.lclark.edu/**drake/publications/sylvester-**icai-2011.pdf<https://webdisk.lclark.edu/drake/publications/sylvester-icai-2011.pdf>
>
> Peter Drake
> http://www.lclark.edu/~drake/
>
>
>
> ______________________________**_________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/**mailman/listinfo/computer-go<http://dvandva.org/cgi-bin/mailman/listinfo/computer-go>
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] A Linear Classifier Outperforms UCT on 9x9 Go

Reply via email to