Fuming,

Far from an expert, I did find two papers that deal topics similar to what
you are working on. One I unfortunately do not have access to (but would be
interested in reading), the other I do not have an official reference for
(perhaps the author can elaborate).

http://www.computer.org/portal/web/csdl/doi/10.1109/ReConFig.2009.75
Describes with an FPGA go-algorithm implementation

http://www.gggo.jp/publications/gpw08-private.pdf
Describes a simulation-server based parallel tree-search implementation

Perhaps those two give you a hint on where to publish your paper. A third
option is the special issue for an IEEE journal in preparation (although the
submission deadline seems to have passed, see
http://ieee-cis.org/_files/SpecialIssueMonteCarloAndGoFinal.pdf).

René

On Thu, Jun 17, 2010 at 7:13 AM, Fuming Wang <[email protected]> wrote:

> Hi Rene,
>
> You guesses about our FPGA implementation are quite to the point. The 167
> games are moving through the 167 pipelined stages of one module instead of
> 167 modules.
>
> As this material is a cross between digital circuit design and computer
> gaming, not quite sure which refereed journal is most suitable for this
> material. Do you or any readers of this list has any suggestions?
>
> Thanks,
> Fuming
>
>
>
> On Thu, Jun 17, 2010 at 8:09 AM, René van de Veerdonk <
> [email protected]> wrote:
>
>> Hi Fuming,
>>
>> Thanks for your answer, it makes much more sense to me now.
>>
>> We are using pipelining in different ways. When I referred to it for a
>> CPU-based single-threaded application, I was thinking about speculative
>> execution. If I understand it correctly, that does not exist in FPGA's, as
>> these are advertised as deterministic in their execution and process flow.
>> In the FPGA case, I imagine that pipelining refers to "unrolling the
>> program", and having different boards physically move across the chip from
>> module to module, as if they are on a production line, all in various states
>> of simulation (board #...@module #101: black to move; board #12@ module
>> #100: white to move; etc.).
>>
>> How you have designed your program in detail would be an interesting read,
>> there are a lot of high-level design trade-offs that you must have dealt
>> with. These will be very different from how you would do it for a CPU-based
>> program. One difference that I imagine, for instance, is the length of the
>> simulation. A CPU-based program stops when the game ends (or you exceed some
>> limit, or you force an early decision, or ...), whereas for FPGA you may end
>> up with a fixed game-length (ready or not, i.e., no early out option) and
>> you may have to simulate pass moves until you reach the end of the
>> "production line" in case the game ended early (is this what you do?). In
>> any case, your impressive numbers suggest that this can be done very
>> efficiently. How you harness all this raw simulation power in a tree-search
>> is yet another research topic that is very interesting and almost
>> orthogonal. Do you think your approach could be mapped to a GPU as well? In
>> any case, I hope you will make a pre-print available to this list when the
>> time is there.
>>
>> In another response in this thread, you mention that you are simulating
>> 167 board in parallel. Does that mean that you unrolled your program for 167
>> moves, moving a board between 167 separate modules every "cycle" and
>> seed/harvest one complete board per "cycle"? Or do you have multiple
>> (shorter) production lines in parallel? Or something else entirely?
>>
>> As you may have noticed, I am looking forward to your paper,
>>
>> René
>>
>> On Tue, Jun 15, 2010 at 7:03 PM, Fuming Wang <[email protected]> wrote:
>>
>>> Hi Rene,
>>>
>>> Our design is fully pipelined, so we are able to simulate multiple games
>>> simultaneously. The way way in which simulations are run in FPGA and in CPU
>>> is quite different, so direct comparison is not easy. If we want to simulate
>>> just one game, FPGA implementation is not 10x faster, however, if we want
>>> thousands of games simulated for a single board position, than FPGA is 10x
>>> faster. So, we are getting 1500k GAMES/sec, but only in the second sense.
>>> The clock rate of our FPGA board is only 125 MHz, so with better board/chip,
>>> we will still have 10-100 times improvement over the current result.
>>>
>>> best,
>>> Fuming
>>>
>>>
>>> On Wed, Jun 16, 2010 at 1:28 AM, René van de Veerdonk <
>>> [email protected]> wrote:
>>>
>>>> Fuming,
>>>>
>>>> Could you please explain your approach a little bit? From the numbers
>>>> you quote, this sounds extreme positive, but I have a hard time
>>>> understanding how you achieve them. Taking 100k playouts/sec for 9x9 on my
>>>> 2.4 GHz labtop for my single-threaded bitmap based light-playout
>>>> implementation as an example, with 110 moves/playout, this results in a
>>>> little less than 240 clock-cycle/move. When I quickly looked up the Cyclone
>>>> III specification, I saw that the clock-speed for this FPGA tops out around
>>>> 240 MHz, yet you achieve 15x the throughput, i.e., you are 150x more
>>>> efficient. This means 1.8 clock-cycle/move. Without being able to make use
>>>> of pipe-lining inside the CPU (someone measured ~2 assembly
>>>> instructions/clock-cycle for my bitmap approach), this leads me to
>>>> questions. First, are you running a single threaded application, or playing
>>>> on multiple boards at once? Second, are you just replaying moves, or also
>>>> generating them on the fly (about half of the time is spend there in my
>>>> implementation, more if you include updating the data-structure to make 
>>>> that
>>>> possible)? Third, are we using the same definitions?
>>>>
>>>> For instance, I would find it much more comprehensible to believe that
>>>> you achieved to do 1500k moves/second instead of 1500k playouts/sec (with
>>>> each playout being ~110 moves). 200 clock-cycles/move sounds do-able if you
>>>> can avoid branching, memory lookups, or miscellaneous calculations by
>>>> creating fine-level parallelism in your FPGA-code and specializing 
>>>> functions
>>>> on a per grid-point basis. In a CPU-based application, this results in
>>>> code-bloat that will become counter-productive at some stage, may not be
>>>> feasible in all instances, and is more difficult to maintain. For an
>>>> FPGA-based application, however, this sounds entirely possible (not knowing
>>>> anything about FPGA's).
>>>>
>>>> Thanks,
>>>>
>>>> René van de Veerdonk
>>>>
>>>>
>>>> On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <[email protected]>wrote:
>>>>
>>>>>
>>>>> Cyclone III
>>>>>  120,000 logical elements
>>>>> cycle time is linear to the number of moves to finish a game, which is
>>>>> approximately linear to the square of the board size.
>>>>>
>>>>> Fuming
>>>>>
>>>>>
>>>>>> - What FPGA? Virtex-6? Spartan-6?
>>>>>> - What size is the core in LUT's?
>>>>>> - Is your cycle time linear in the board size or in the number of
>>>>>> squares (i.e. quadratic in board size)? Or something else?
>>>>>>
>>>>>> --
>>>>>> GCP
>>>>>> _______________________________________________
>>>>>> Computer-go mailing list
>>>>>> [email protected]
>>>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Computer-go mailing list
>>>>> [email protected]
>>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Computer-go mailing list
>>>> [email protected]
>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>>
>>>
>>>
>>> _______________________________________________
>>> Computer-go mailing list
>>> [email protected]
>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to