Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

Hendrik Baier Wed, 26 Jan 2011 05:44:36 -0800

Hi Aja,

that's a good question. At least for the LGR policy without forgetting(https://webdisk.lclark.edu/drake/publications/drake-icga-2009.pdf),only using the first appearance of a reply did not significantly differin performance. A possible explanation could be that in cases where thesame move by the same player appears twice in a playout, the first stonemust have been captured, and therefore the answer to the second play isthe one that really influences the final position/result. I'm not sure Irepeated this experiment with LGRF, but I did try dismissing the tailsof playouts (with the rationale that there might be too much noise) andignoring stones that would later be captured (with the rationale thatthose moves might be bad on average). Both variants were significantlyweaker than plain LGRF.It's only a few lines of code, test it and see if it makes a differencefor your playout policy and program architecture. Stronger playoutpolicies than Orego's will have different interactions with LGRF. Youcould even try saving several sets of replies per intersection, for thefirst, second, third appearance of the previous move in a playout, inthe hope of capturing certain tactical situations with sacrifices. But Idon't expect much.


Hendrik

Am 26.01.2011 14:13, schrieb Aja:

Hi Hendrik,

Thanks.
Congratulations, you have done a really nice work. I check yourthesis. My result is consistent with yours of LBR-2. No benefit atall, so I took it off. I adapt LGR-1 to softmax policy of Erica.Basically, I am tuning the probability offset by checking somearitifical test-positions. In 3000 playouts, now it scores around 57%after 500 games, almost 60%, which is my target (my intuition is LGR-1should help a lot already). :)
Actually I have one question and still can't figure out yourreasoning. In a playout, why do you over-write the earlier replies bythe later ones? Using the earliest one looks more reasonable to me.
Aja

----- Original Message ----- From: "Hendrik Baier" <>
To: <[email protected]>
Sent: Wednesday, January 26, 2011 5:00 PM
Subject: Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79
Hi Aja,
I would be interested in your results. I think the LGRF policy isonly a small first step into the direction of more adaptive playouts(and hopefully the overcoming of the horizon effect).As for the Last-Bad-Reply idea, you can read about my experienceswith this and related policies in my Master's thesis, if you'reinterested. It contains the idea that resulted in the "Power ofForgetting" paper as well.http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf
regards,
Hendrik
I admit that it's difficult for me to include such deterministicdefault policy. :-)With softmax policy, using the information of "last-LOST-reply" ismaybe a good direction.
Aja
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

Reply via email to