The initial value of Q is not very important because Q+U is dominated by the U piece when the number of visits is small.
On Sun, Dec 3, 2017 at 3:39 PM, Brian Lee <brian.kihoon....@gmail.com> wrote: > It should default to the Q of the parent node. Otherwise, let's say that > the root node is a losing position. Upon choosing a followup move, the Q > will be updated to a very negative value, and that node won't get explored > again - at least until all 362 top-level children have been explored and > revealed to have negative values. So without initializing Q to the parent's > Q, you would end up wasting 362 MCTS iterations. > > Brian > > On Sun, Dec 3, 2017 at 3:25 PM <computer-go-requ...@computer-go.org> > wrote: > >> Send Computer-go mailing list submissions to >> computer-go@computer-go.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://computer-go.org/mailman/listinfo/computer-go >> or, via email, send a message with subject or body 'help' to >> computer-go-requ...@computer-go.org >> >> You can reach the person managing the list at >> computer-go-ow...@computer-go.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Computer-go digest..." >> >> >> Today's Topics: >> >> 1. action-value Q for unexpanded nodes (Andy) >> 2. Re: action-value Q for unexpanded nodes (Álvaro Begué) >> 3. Re: action-value Q for unexpanded nodes (Andy) >> 4. Re: action-value Q for unexpanded nodes (Rémi Coulom) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sun, 3 Dec 2017 08:53:02 -0600 >> From: Andy <andy.olsen...@gmail.com> >> To: computer-go <computer-go@computer-go.org> >> Subject: [Computer-go] action-value Q for unexpanded nodes >> Message-ID: >> <CAAtbd5Cguzt4arbSuM8-d91J31zNQ+2TKzpbXV4U5fxThHd3BQ@mail. >> gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> I don't see the AGZ paper explain what the mean action-value Q(s,a) should >> be for a node that hasn't been expanded yet. The equation for Q(s,a) has >> the term 1/N(s,a) in it because it's supposed to average over N(s,a) >> visits. But in this case N(s,a)=0 so that won't work. >> >> Does anyone know how this is supposed to work? Or is it another detail AGZ >> didn't spell out? >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: <http://computer-go.org/pipermail/computer-go/ >> attachments/20171203/8fc94bcd/attachment-0001.html> >> >> ------------------------------ >> >> Message: 2 >> Date: Sun, 3 Dec 2017 10:44:00 -0500 >> From: Álvaro Begué <alvaro.be...@gmail.com> >> To: computer-go <computer-go@computer-go.org> >> Subject: Re: [Computer-go] action-value Q for unexpanded nodes >> Message-ID: >> <CAF8dVMU_F0ue2YyKvBwVKrcSUY93WN-X9M8TgMcz+dqfbe4AaA@mail. >> gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> I am not sure where in the paper you think they use Q(s,a) for a node s >> that hasn't been expanded yet. Q(s,a) is a property of an edge of the >> graph. At a leaf they only use the `value' output of the neural network. >> >> If this doesn't match your understanding of the paper, please point to the >> specific paragraph that you are having trouble with. >> >> Álvaro. >> >> >> >> On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen...@gmail.com> wrote: >> >> > I don't see the AGZ paper explain what the mean action-value Q(s,a) >> should >> > be for a node that hasn't been expanded yet. The equation for Q(s,a) has >> > the term 1/N(s,a) in it because it's supposed to average over N(s,a) >> > visits. But in this case N(s,a)=0 so that won't work. >> > >> > Does anyone know how this is supposed to work? Or is it another detail >> AGZ >> > didn't spell out? >> > >> > >> > >> > _______________________________________________ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: <http://computer-go.org/pipermail/computer-go/ >> attachments/20171203/b8f3d1cc/attachment-0001.html> >> >> ------------------------------ >> >> Message: 3 >> Date: Sun, 3 Dec 2017 10:27:16 -0600 >> From: Andy <andy.olsen...@gmail.com> >> To: computer-go <computer-go@computer-go.org> >> Subject: Re: [Computer-go] action-value Q for unexpanded nodes >> Message-ID: >> <CAAtbd5CBDTsJ7wHjm9MybrTDBzLhqduJiTOSN49Ce8kUT5_vXw@mail. >> gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> >> Figure 2a shows two bolded Q+U max values. The second one is going to a >> leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that >> Q value from? >> >> The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo >> tree search in AlphaGo Zero. a Each simulation traverses the tree by >> selecting the edge with maximum action-value Q, plus an upper confidence >> bound U that depends on a stored prior probability P and visit count N for >> that edge (which is incremented once traversed). b The leaf node is >> expanded..." >> >> >> >> >> >> >> 2017-12-03 9:44 GMT-06:00 Álvaro Begué <alvaro.be...@gmail.com>: >> >> > I am not sure where in the paper you think they use Q(s,a) for a node s >> > that hasn't been expanded yet. Q(s,a) is a property of an edge of the >> > graph. At a leaf they only use the `value' output of the neural network. >> > >> > If this doesn't match your understanding of the paper, please point to >> the >> > specific paragraph that you are having trouble with. >> > >> > Álvaro. >> > >> > >> > >> > On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen...@gmail.com> wrote: >> > >> >> I don't see the AGZ paper explain what the mean action-value Q(s,a) >> >> should be for a node that hasn't been expanded yet. The equation for >> Q(s,a) >> >> has the term 1/N(s,a) in it because it's supposed to average over >> N(s,a) >> >> visits. But in this case N(s,a)=0 so that won't work. >> >> >> >> Does anyone know how this is supposed to work? Or is it another detail >> >> AGZ didn't spell out? >> >> >> >> >> >> >> >> _______________________________________________ >> >> Computer-go mailing list >> >> Computer-go@computer-go.org >> >> http://computer-go.org/mailman/listinfo/computer-go >> >> >> > >> > >> > _______________________________________________ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: <http://computer-go.org/pipermail/computer-go/ >> attachments/20171203/c01677b3/attachment-0001.html> >> >> ------------------------------ >> >> Message: 4 >> Date: Sun, 3 Dec 2017 17:57:51 +0100 (CET) >> From: Rémi Coulom <remi.cou...@free.fr> >> To: computer-go@computer-go.org >> Subject: Re: [Computer-go] action-value Q for unexpanded nodes >> Message-ID: >> <1885878373.291683317.1512320271343.JavaMail.root@spooler6-g27> >> Content-Type: text/plain; charset=utf-8 >> >> They have a Q(s,a) term in their node-selection formula, but they don't >> tell what value they give to an action that has not yet been visited. Maybe >> Aja can tell us. >> >> ----- Mail original ----- >> De: "Álvaro Begué" <alvaro.be...@gmail.com> >> À: "computer-go" <computer-go@computer-go.org> >> Envoyé: Dimanche 3 Décembre 2017 16:44:00 >> Objet: Re: [Computer-go] action-value Q for unexpanded nodes >> >> >> >> >> I am not sure where in the paper you think they use Q(s,a) for a node s >> that hasn't been expanded yet. Q(s,a) is a property of an edge of the >> graph. At a leaf they only use the `value' output of the neural network. >> >> If this doesn't match your understanding of the paper, please point to >> the specific paragraph that you are having trouble with. >> >> Álvaro. >> >> >> >> >> >> On Sun, Dec 3, 2017 at 9:53 AM, Andy < andy.olsen...@gmail.com > wrote: >> >> >> >> I don't see the AGZ paper explain what the mean action-value Q(s,a) >> should be for a node that hasn't been expanded yet. The equation for Q(s,a) >> has the term 1/N(s,a) in it because it's supposed to average over N(s,a) >> visits. But in this case N(s,a)=0 so that won't work. >> >> >> Does anyone know how this is supposed to work? Or is it another detail >> AGZ didn't spell out? >> >> >> >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> >> ------------------------------ >> >> End of Computer-go Digest, Vol 95, Issue 5 >> ****************************************** >> > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go