i am coming late into this thread and apologize for the intrusiveness, but i feel like i'm missing something here and thought that i'd give my first thought.
if the number of processors is fixed (i.e. you don't have to worry about it increasing or decreasing over the course of the game), why not partition the board? a naive partition (i.e. i have 4 processors, and processor one can only examine first moves in the upper left quadrant of the board, processor two can only examine first moves in the upper right quadrant of the board, etc.) would have the problem that the only important moves might be clustered all at one processor. a smarter partition would avoid this problem completely as long as the number of first moves worth considering was on the order of the number of processors -- randomly partition the board (it can be a fixed random partition that never changes over the course of the game if you want). so if you have P processors, assign every point on the board a random floating point value, sort them by this value, take the first 1/P fraction of points and assign them to processor #1, the next 1/P fraction of points and assign them to processor #2, etc. this can be done before the game even starts. so the rule should be that the first move examined for a given processor must come from his assigned point list. it should be very easy to force this to happen in an efficient way (i.e. you're just randomly selecting the first move from a smaller list than all moves afterward). s. On Tue, Apr 5, 2011 at 4:50 AM, Daniel Shawul <[email protected]> wrote: > Oh no I actually agree with you. But I share concerns of Joona about the > kind of algorithm to be used. > He probably used root parallelization scheme like I just did. It scales > almost perfectly on a cluster of 16 cpus nps wise. > But then I realized this might not be as good as it seems because the > different processors might not produce > significantly different trees.. Infact at first I wrongly seeded the random > number generator to time(NULL) which > gave me almost the same tree in all the processors. Then I modified that to > include the processor id and it starts > producing different trees, but I still think there is lot of redundancy in > there. If I still use the same exploration constant > UCTK all processors are going to explore more or less to the same depth. So > the benefit will be a wider (i.e safer) search > ,not necessarily deeper. The improvement from a 16x more computing time > (i.e shared tree) should be much more than the one I get > by using this simple parallelization scheme. I am trying to introduce more > selectivity by dividing UCTK by number of processors N > (i.e UCTK / N). Does that sound reasonable ? > On Tue, Apr 5, 2011 at 3:25 AM, Gian-Carlo Pascutto <[email protected]> wrote: >> >> Those were single processor tests. >> >> >> >> I wanted to point out that your “speed is not a very big issue” guess is >> wrong and that the heaviness of the playouts doesn’t change that either. >> Strength scales strongly with playouts/speed. So something is limiting your >> result, and a bad parallelization can very well be it. >> >> >> >> >> >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Daniel Shawul >> Sent: Tuesday, April 05, 2011 1:56 AM >> >> To: [email protected] >> Subject: Re: [Computer-go] UCT parameters and application to other games >> >> >> >> Something I am not clear about is the algorithm used. Or is it just a test >> on a single >> >> processor version by giving it more time to search? If the tree is not >> shared as in the 'root parallelization' >> scheme, I have doubts if it can search 'deeper' even though nps scales >> almost perfectly. >> By deeper ,I mean more expansion to the best moves. Since the processors >> are not searching the same tree , >> it seems to me the search in parallel is wider (more alternatives) but not >> necessarily deeper searched. >> Or is this taken care of by using a lower exploration constant (c) for the >> parallel search in the hope that >> it would fill the void introduced by adding more selectivity ? >> >> >> On Mon, Mar 28, 2011 at 5:49 AM, Gian-Carlo Pascutto <[email protected]> >> wrote: >> >> >However one of my early observations with Go programming has been >> >that the speed is not a very big issue in this field (at least in lower >> levels). >> >> >[...] >> >> >10x more processing power only gave around 100 extra elo points. >> > >> > >> >Of course with heavy playouts and more fine tuned programs things can >> >be very different... >> >> No. You probably have some other problem or bugs. See for example this >> study: >> >> http://cgos.boardspace.net/study/13/index.html >> >> Leela Lite was using light playouts. The slope is a bit flatter than the >> real program, but not much more so. You should get far more than 100 ELO >> for >> a 10x speedup. >> >> -- >> GCP >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> >> >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
