Re: [computer-go] GPGPU

2009-09-14 Thread Vincent Diepeveen

Hi Petr,

I missed this posting yours at feb 10.

On Feb 10, 2009, at 1:44 AM, Petr Baudis wrote:


  Hi!

  There has been some talk about implementing monte-carlo playouts on
GPUs in the past, I have heard rumours about Polish bachelor student
doing libego - GPGPU conversion as a project, etc. but I know of
nothing concrete ever materializing - is anybody aware of anything?



Of course as we will all realize a lot of people have tried all kind  
of stuff on gpgpu

already and obviously at first at nvidia.

For chess at the moment it's more complicated than go as a shared  
hashtable is real

important in chess. This will come in go also as being real important.

Whether the current generation nvidia or amd can reach the optimum of  
the claimed
performance is of course interesting question today. However it's not  
so relevant
in longterm, as what is real relevant is the question: what is  
theoretical possible here?


We can simply assume that future generations improve. Be it nvidia,  
be it AMD, be it some other

manufacturer (intel?).

with some extra overhead it's quite possible to have a shared  
hashtable. All those things

i solved here on paper.

The real interesting thing is of course things like branches. How to  
replace them with

near to branchless code?

That's not so easy. That simply requires overhead. In case of pattern  
matching,
it's obvious that in x86 you can skip a lot of patterns easily cheap  
by having a single

compare that fails. So that's a cycle or 30 you lose maximum.

Let's say probably a lot LESS cycles. 15 or so.

Not in gpgpu.

However as a compensation gpgpu's can push through more instructions.

Theoretical we can speak about a quadcore say i7 965:

3.2Ghz * 4 instructions a cycle * 4 cores = 51.2 G instructions per  
second = 51.2GIPS


AMD : 320 cores @ 5 execution units = 1600 execution units * 0.97Ghz  
* 1 instruction a cycle = 0.97 * 1600 = 1552 GIPS

NVIDIA : 480 streamcores * 1.5Ghz * 1 instruction a cycle = 720 GIPS
note sometimes nvidia counts double a cycle as there is multiply add  
in floating point, so i don't want to say that
AMD is superior here to nvidia or vice versa, as it is a total  
different architecture to start with and obviously AMD
is not capable of doing multiplication with all 5 units of each full  
core; so the gflops count is something total different

here than what we do namely game tree search.

For me a factor 2 here is really not interesting to discuss, so i  
really don't want to get into a discussion which of the 2
architectures is faster for game tree search at this moment. I might  
when i get sponsored by one of the manufacturers though :)


The interesting thing is that to AMD the fastest intel quadcore chip  
is say factor 30 slower on paper if we look to the raw GFLOP.


For all kind of gflop researches we know that the x86's really get  
far over 90% efficiency and the best claim ever
of serious software that runs in huge numbers at gpu's, they claim  
40% efficiency.


Scientists who claim therefore more than say factor 15 speed  
difference of a gpu versus x86, they are not objective;

if not committing complete fraud on paper.

To my estimate it would be possible to do things at a factor 5 loss.  
So you're 6 times faster then than a quadcore.
As i like numbers that are a multiple of 5 in this case, i round it  
down to 5.


Whatever complex calculation i do taking into account all factors,  
after months of study over a period of more than 2.5 nearly 3 years now,

i each time come down to factor 5, already for a number of years.

Achieving this is not simple. Let's first look whether it is possible  
to also get a memory bandwidth of factor 5.


If we look to memory bandwidth this factor 6 you can sustain realtime  
and nonstop 24/24 hours a day,
as the memory bandwidth claimed of the i7 is say 20-30GB/s and the  
topend gpu's now definitely are over factor 5
faster in bandwidth. AMD has DDR5 there (not sure nvidia already has,  
but if not then they'll follow soon), so to speak the gpu's
have 2 generations newer RAM than the quadcores. Just the difference  
in RAM already means factor 4 faster for the gpu's,
multiply to that more memory controllers that they use than the cpu's  
and fact that a lot gets done in a decentral manner at gpu's

whereas in cpu's there is a centralized L3 cache to the RAM.

So bandwidth wise seen it is quite possible to be practical 5 times  
faster or so up to 10 times if you're a bit more optimistic.


However to get it out of the gpu's is not so easy for game tree  
search. Optimizing low level is already really complicated.
Here you also need 2 layer parallel model at least and figure out  
exactly where the chip can get a high IPC (number of

instructions per cycle) and how to get a high IPC done.

We have had now say 30 years of optimizations for processors in a  
sequential manner with branches. Programming code
that's with nearly no branches is a total different league. Using the  

Re: [computer-go] CUDA and GPU Performance

2009-09-13 Thread Vincent Diepeveen


On Sep 13, 2009, at 10:19 AM, Petr Baudis wrote:


On Sun, Sep 13, 2009 at 01:02:40AM +0200, Vincent Diepeveen wrote:


On Sep 10, 2009, at 12:55 AM, Michael Williams wrote:


Very interesting stuff.  One glimmer of hope is that the memory
situations should improve over time since memory grows but Go
boards stay the same size.



Well you first have to figure out how fast or slow shifting is on
the nvidia's before actually going for it :)


Just read the nVidia docs. Shifting has the same cost as addition.



Document number and url?


P.S.: The PTX assembler is also documented. Or have you meant some
secret undocumented instruction goodies?

Petr Pasky Baudis
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] CUDA and GPU Performance

2009-09-13 Thread Vincent Diepeveen

A document of 2 weeks ago where they write at least *something*,
not bad from nvidia, knowing they soon have to give lessons to  
topcoders :)


It's not really systematic approach though. We want a list of all  
instructions with latencies and throughput latency
that belong to it. Also lookup times to the caches, shared memory and  
RAM would be great to know, even when it

would only be bandwidth numbers.

I do see references to sections B and C for a multiplication  
instruction and memory fetch instruction that seems to exist,

but can't find that section at all.

Yet nowhere in document i see which hardware instructions the Nvidia  
hardware supports.


Mind giving page number?

Vincent

On Sep 13, 2009, at 11:43 AM, Petr Baudis wrote:


On Sun, Sep 13, 2009 at 10:48:12AM +0200, Vincent Diepeveen wrote:


On Sep 13, 2009, at 10:19 AM, Petr Baudis wrote:

Just read the nVidia docs. Shifting has the same cost as addition.



Document number and url?


http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/ 
NVIDIA_CUDA_Programming_Guide_2.3.pdf



P.S.: The PTX assembler is also documented. Or have you meant some
secret undocumented instruction goodies?


http://www.nvidia.com/object/io_1213955209837.html

--
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] CUDA and GPU Performance

2009-09-12 Thread Vincent Diepeveen

Thanks for sharing this Christian,
in my lines comments.

On Sep 9, 2009, at 5:54 PM, Christian Nentwich wrote:



I did quite a bit of testing earlier this year on running playout  
algorithms on GPUs. Unfortunately, I am too busy to write up a tech  
report on it, but I finally brought myself to take the time to  
write this e-mail at least. See bottom for conclusions.


For performance testing, I used my CPU board representation, and a  
CUDA port of the same (with adjustments), to test the following  
algorithm:

 - Clear the board
 - Fill the board according to a uniform random policy
 - Avoid filling eyes, according to simple neighbour check
 - Avoid simple ko
 - Count the score and determine the winner

In other words: no tree search is involved, and this is the  
lightest possible playout. The raw numbers are as follows:
 - CPU Search: 47,000 playouts per CPU core per second, on an Intel  
6600 Core-2 Duo
 - GPU Search: 170,000 playouts per second, on an NVidia Geforce  
285 card




Interesting to know is how many nodes per second this is that hammers  
to the local shared memory
and which IPC you overall had. This 9% might not be very accurate  
read further for that.


nvidia is of course the weakest of all GPU manufacturers currently if  
you run on the gpu's rather than the tesla.
Nvidia is of course most famous as they were the first with gpgpu for  
the big masses and promoting this.
Note in 90s i already stumbled upon implementations of chessprograms  
at graphics cards by hardware designers who did do this as a hobby,

at the time they were higher clocked than cpu's in many occasions.

Nvidia is not really revealing what instructions the gpu's have, so  
this is a major bottleneck in your software program probably.
To start with it is quite possible that entire 'gflops' calculation  
of nvidia is totally based upon combined instructions,

such as multiplyadd.

So streamcore in principe can do at maximum 1 instruction a cycle,  
yet the numbers Nvidia slams around with are based upon
things that are counted as 2 instructions (so 8 flops) whereas in  
reality it is 1 instruction at the gpu.


If you're not using this in the program then that already hammers  
down theoretical gains one can theoretical achieve at nvidia.


Other manufacturers seem more open here.

Note that privately approaching Nvidia also didn't help. It was for  
some potential pilot project a while ago for simulation software  
(simulation of whatever
generics up to fighter jet software). Obviously a pilot is a pilot  
project and not some massive deal. Nvidia indicated only wanting to  
reveal *perhaps* something

in case of a paper deal of really a lot of 'tesla' cards.

As we know those are a bit pricey (not for the simulator clients), so  
a deal of hundreds or thousands of pre-ordered cards, that would not  
work out of course.

A pilot project is there to convince something is possible.

We know that the bandwidth from the gpu's for gpgpu work is a lot  
worse than from the tesla versions of nvidia,
which is throughout your story the problem. However i see solutions  
there.


Also your result is not so bad. Most likely you wrote quite efficient  
software in CUDA.
So basically what so to speak limited your nps is the fact that your  
software program probably is doing very little work a node.


When using more knowledge, it is quite possible that you hardly  
slowdown, as latency to the memory is simply what you keep hitting.

So maybe you can make the program lossless more knowledgeable.

Also what i am missing fro myour side is some technical data with  
respect to how many bytes you lookup in shared RAM for each  
streamcore at each node.
For example, i might be confusing device RAM here, i thought  
cacheline size is 256 bytes per read to each core.


So if you use considerable less than that, it's seriously slower of  
course.


The algorithm running on the GPU is a straight port, with several  
optimisations then made to severely restrict memory access. This  
means the algorithm is a naive sort of parallel algorithm,  
parallel on a per-board level like the CPU implementation, rather  
than per-intersection or some other sort of highly parallel algorithm.


Memory access other than shared processor memory carries a severe  
penalty on the GPU. Instead, all threads running on the GPU at any  
one time have to make do with a fast shared memory of 16834 bytes. So:
 - The board was compressed into a bit board, using 2*21 unsigned  
ints per thread


2 * 21 * 32 = 1344 bits

Maybe it is the case the gpu is very ugly slow in bit manipulations,
these things have not been designed to do some sort of squared  
cryptography.


 - The count of empty, white and black intersections and the ko  
position was also in shared memory per thread


it doesn't speed up near the edges to do things dirty cheap and illegal
and just gamble that illegal results don't backtrack to root ?
(this as their playstrength is real weak yet)

Re: [computer-go] CUDA and GPU Performance

2009-09-12 Thread Vincent Diepeveen


On Sep 9, 2009, at 11:57 PM, Christian Nentwich wrote:


Mark,

let me try to add some more context to answer your questions. When  
I say in my conclusion that it's not worth it, I mean it's not  
worth using the GPU to run playout algorithms of the sort that are  
in use today. There may be many other algorithms that form part of  
Go engines where the GPU can provide an order-of-magnitude speedup.  
Still more where the GPU can run in parallel with the CPU to help.


In my experiments, a CPU core got 47,000 playouts per second and  
the GPU 170,000. But:
  - My computer has two cores (so it gets 94,000 playouts with 2  
threads)


later generion quadcores hardly have a higher ipc than core2.

the core2 dual has a brilliant IPC.

Nehalem is hardly faster than phenom2 nor core2 for diep. It's really  
the compiler quality and tricks with turboboost
that lure the audience (like the experimental testmachine at  
testsites gets cooled down to far under 20C, entire machine
all components, as there is a powerincrease of 10% when moving from  
25C to 50C, ever seen at home a machine that's
cooler than 50C? Additionally the turboboost gets manually  
overclocked/put to +600Mhz and the RAM is a type of RAM
you can't realy afford bla bla bla). Of course this is multibillion  
companies and every single one of them tries to outdo another

one to look better.

So really you should compare it 1 to 1 powerwise.

the gtx285 then is not so impressive. It's on par with quadcore  
nehalems in terms of gflops per watt.
I wouldn't say it's an outdated gpu, as it is a fast gpu, but for  
gpgpu it obviously is slow.


The latest AMD gpu is however 4 times better here.

So your result is  maximum factor 2 off for the core2 playouts there  
at a chip that in other areas is on par with Nehalem.

You beat it factor 2 there.

8 core machines is not a fair compare as those have 2 sockets. So you  
should compare that with the 4 tesla setup, it has 960 streamcores.


The only fair Nvidia compare with quadcores is when using tesla. Now  
i realize it is like $1800 a piece nearly, which is a lot for a GPU  
on stereoids,

yet that's a fair compare to be honest.

If we compare things let's compare fair. A 8 core nehalem is the  
maximum number of cores intel can deliver single machine as a fast  
machine.
I'm skipping the single memory controller 24 core box now from a year  
ago (dunnington).


The 8 core setup you really should compare with the Tesla times 4  
cpu's so that's 960 cores.


In reality you take a 300 euro card now. What's there for 300 euro  
from intel or AMD, not a 3.2Ghz i7-965 that's for sure,

as that thing has a cost of 1000+ euro.

So effectively you lose a factor 2 at most, your thing at the nvidia  
is still scaling better then.


As for the parallel speedup one would get out of game tree search  
with so many threads versus 4 fast cores, this is a reality.

Yes that's not so efficient yet.

However there is a solution there to get a good speedup (at least for  
chess) that i figured out on paper. If there is 1 solution i bet  
there is more
and also solutions for computer-go. The problem as always is getting  
funded to carry something out like that, as software on a gpu doesn't  
sell

of course.

  - My computer's processor (intel core duo 6600) is 3 years old,  
and far from state of the art
  - My graphics card (Geforce 285) on the other hand, is recently  
purchased and one of the top graphics cards


That means that my old CPU already gets more than twice the speed  
of the GPU. An Intel Nehalem processor would surely beat it, let  
alone an 8-core system. Bearing in mind the severe drawbacks of the  
GPU - these are not general purpose processors, there is much you  
can't do on them - this limits their usefulness with this  
algorithm. Compare this speedup to truly highly parallel  
algorithms: random number generation, matrix multiplication, monte- 
carlo simulation of options (which are highly parallel because  
there is no branching and little data); you see speedups of 10x to  
100x over the CPU with those.




matrixmultiplication is not THAT easy to solve well. Initial attempts  
were 20% efficient at Nvidia gpu's when using faster approximations  
(so not stupid simple manner which is
ugly slow, but FFT wise), and that was for ideal sized matrice. So if  
in the lab it already is that inefficient that means there is  
problems everywhere.


it's maturing rapidly now however.

The 9% occupancy may be puzzling but there is little that can be  
done about that. This, and the talk about threads and blocks would  
take a while to explain, because GPUs don't work like general  
purpose CPUs. They are SIMD processors meaning that each processor  
can run many threads in parallel on different items of data but  
only if *all threads are executing the same instruction*. There is  
only one instruction decoding stage per processor cycle. If any  
if statements or loops diverge, threads will be 

Re: [computer-go] CUDA and GPU Performance

2009-09-12 Thread Vincent Diepeveen


On Sep 10, 2009, at 12:55 AM, Michael Williams wrote:

Very interesting stuff.  One glimmer of hope is that the memory  
situations should improve over time since memory grows but Go  
boards stay the same size.




Well you first have to figure out how fast or slow shifting is on the  
nvidia's before actually going for it :)


Remember this is graphics processors, not CPU's.
Even at some cpu i remember the p4-prescott it was more than or equal  
to 7 cycles to shift right.


Vincent

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rules for remote play at the Computer Olympiad

2009-04-05 Thread Vincent Diepeveen
Actually in computerchess it happens just sometimes and just by 1  
team it has been done very clearly and that team is not from
Europe yet from Middle East / Asia. The odds of an Asian cheating,  
someone who hardly makes enough cash to even pay for some basic things,
are quite bigger than that someone from Western Europe, with on  
average a salary (for example in Netherlands) of 52000 euro a year
for an IT guy, is going to cheat. The Asian has nothing to lose and  
even a 1000 dollars worth of sales in total he's happy with
(and they all will be overestimating what you can make with computer- 
go with a tiny product).


So except for this non-european team, I'm rather convinced that all  
the other teams past years were pretty clean.
As a good chessplayer it's easy to see when someone cheats, in go  
that's even going to be easier.


Considering how little progress the go program authors have made,  
other than importing computer chess authors,
and considering the Asian problem of people who have nothing to lose,  
cheating is going to be a much bigger issue.


The huge difference between computer-go and computerchess is  
obviously not only the level of the software,

but especially the time. It is 2009 now.

There is a range of new possibilities now to cheat. Either with or  
without remote machines.
Furthermore behind the public naked eye, they learn from the  
computerchess guys how one can cheat without getting

detected.

So the only manner to detect cheats is in the method i described.

Biggest hidden issue in computer games, both chess and go is probably  
stolen source code that gets used by some teams, rewritten to their  
own datastructure.


Note that statistical seen in computerchess, considering the money  
that was at stake for some, there have been very few cheats
at events; in normal sports with normal people that are a tad less  
clever, the amount of dope that gets used is a lot bigger.


In fact in a normal sport without ANY form of dope you don't even get  
in top1000.


Vincent

On Apr 4, 2009, at 2:07 PM, Don Dailey wrote:


On Sat, 2009-04-04 at 06:14 -0400, steve uurtamo wrote:


Moreover, this is a really complicated issue.
Yes, and I think cheating will always be possible.   It's like  
cryptography,  nothing is ever unbreakable.


I was quite appalled at how often it happened in computer chess  
when I was active in it,  and there were also incidents of humans  
using computers and having the moves transmitted to them.  And  
of course in correspondence chess I think they had to allow  
computers because the honest players were at a disadvantage.


- Don



There has been some extensive statistical work on human cheating  
in chess done by Ken Regan at the University at Buffalo. However,  
this relies heavily upon the fact that computers dominate human  
play by a wide margin. The same is not the case in go. s. On Sat,  
Apr 4, 2009 at 1:56 AM, Robert Jasiek jas...@snafu.de wrote:   
Vincent Diepeveen wrote:   If a program under no circumstance  
can reproduce a specific move and that  for several occasions,  
then that's very clear proof of course.   [...]   Statistics  
prove everything here.   No. Rather it proves that the program  
cheats OR that the methods of  detecting cheating are improper.   
 One always must have a logfile   Good.   --  robert jasiek  
 ___  computer-go  
mailing list  computer-go@computer-go.org  http://www.computer- 
go.org/mailman/listinfo/computer-go/   
___ computer-go  
mailing list computer-go@computer-go.org http://www.computer- 
go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: GCP on ICGA Events 2009 in Pamplona

2009-04-03 Thread Vincent Diepeveen


On Jan 14, 2009, at 1:42 PM, Mark Boon wrote:

It's difficult to get hard data about this. Go is only the most  
popular game in Korea. In other countries like Japan and China it  
comes second by far to a local chess variation.


Possibly Chess is more ingrained in Western culture than Go is in  
Asia, I don't know really. But Chess has the population-numbers of  
West vs. East against it. If there are more chess-players than Go- 
players in the world then it won't be by much. But the Go market is  
probably a lot bigger. Look only at the money in professional Go  
tournaments. It's probably an order of magnitude more than the  
money in professional Chess. But I must admit this is just a guess  
of mine.


Mark



Oh la la,

The origin of chess and go isn't far away from each. Chess originates  
from India,

go not far away from there, if you look at it from a global perspective.

Both go and chess are really similar in that they are symmetric games.

From strong player perspective there isn't that much difference in  
the game in

intellectual experience.

A good chessplayer can be a good go player and vice versa.
Quite the opposite with the zillions of checkers versions there are.
Checkers is NOT a symmetric game.

If Chess would get added again to the olympic sports (which i doubt  
happens,

but you never know how political decision taking takes place) that would
be good news for China's women team. Maybe they lose on board 1  
sometimes,

but the other boards they'll win all games.

Chess is becoming really big in China now (heh i'm still looking for  
a girlfriend,

know a Chinese female go or chessplayer?)

I'm quite sure chess is by now bigger in China than go there. Of  
course the step from

Chinese chess to chess is real real tiny as well.

Chess gets played in every nation on the planet. Tiniest chess is  
probably in Japan.

Shogi and go seem to be more popular there.

South Korea used to be real tiny also for chess, a new initiative  
there might boost it a tad more.


Chess' advantage is of course the fact that the game is a lot quicker  
than go.


Now for serious, strong players, that is not an advantage, but for  
the 'big masses' it is.

Chess computers used to get exported to 105+ nations world wide.

As for the rest of the planet, with exception of Japan and Korea, go  
doesn't exist.


There is no doubt about that some very succesful chess and go players  
to be very very wealthy.
If you're good in that game, you have good brains of course, everyone  
likes to pay you, most chessplayers
even get asked to run a business of some billionair type guys. I  
don't doubt that's identical in go.


Whether 1 go player has more billions worth of wealth than a  
chessplayer, that's not very interesting.


As for the 'subtop', there chess is quadratic bigger than go. How  
many people live from chess?

Well thousands. Amazing yet true.

Whereas in a few nations like Netherlands the number of chessplayers  
that are a member of a federation
is getting less, realize the tiny size of the nation here,  
netherlands has exactly 16.5 million inhabitants.


Even then each town still has a chessclub.

Chess is total booming in India, China, Turkey and Spain.

Just these 3 nations are already nearly 3 billion people.

When i was in China, i saw zero go boards anywhere.

Vincent



On Jan 12, 2009, at 9:22 AM, steve uurtamo wrote:


i think you might be estimating this incorrectly.

s.

On Sat, Jan 10, 2009 at 9:00 AM, Gian-Carlo Pascutto  
g...@sjeng.org wrote:

Ingo Althöfer wrote:


What prevents you from freezing in your chess
activities for the next few months and hobbying
full (free) time on computer go.


The amount of chess players compared to the amount of go players.

--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: GCP on ICGA Events 2009 in Pamplona

2009-04-03 Thread Vincent Diepeveen

hi,

You're miscounting here completely again.

Counting the number of federation members is a bad idea.
Count the number of people who know a game and regurarly play it.

Draughts (internatoinal 10x10 checkers, using polish rules) is really  
tiny.


It is not culture to get a member of a club in a nation like  
Netherlands anymore,

netherlands is a very weird nation in that respect.

Yet many learn the game. You shouldn't let yourself get led by  
federation numbers,

but by the number of people who know a game and play it regurarly.

Active official competition players is simply a bad measure.

The online chess servers are indeed a bad measure.

It is spreaded over a number of servers. Yahoo has a daily load of  
about 7000 that are nonstop playing,
chessclub.com which used to be a paid club and still is, it has about  
a million who visit, yet as a membership
has a cost of 50 dollar, which is 500 yuan nearly a year, that's a  
price i'm not willing to pay either.


chessclub offers free membership to titled players. Yet i'm titled  
and i have to pay how about that?

Having 2 dutch titles and a fidemaster title seemingly doesn't count :)

The strongest players are at chessclub.com

It's about 2500 nonstop (so 24 hours load) now, chessbase is becoming  
a tad larger now, note also a paid
chess club. About 3000 load. Yet again you have to either buy a  
product of them for 50+ euro for a free 1 year

membership, or you have to pay 25 euro a year for membership.

There is a range of servers. Each one has about a 500-1000. Most are  
paid.


There chess is really doing bad of course.
Experience learns when you setup a free server with a good client  
that you soon get a huge load.

5000+ is very normal.

There is a lot of tiny communities with a 400+ players 24 hours a day  
online.


that's what you get when there is 105+ nations playing chess.
Each federation wants their own server.

Yet they do not want to pay for it. FIDE really is doing very bad there.
Their problem would be they go to commercial companies which have all  
their own
interest, instead of using a few of their guys who already work in  
organisations and are objective.


Vincent



On Jan 14, 2009, at 4:06 PM, Mark Boon wrote:



On Jan 14, 2009, at 12:43 PM, Thomas Lavergne wrote:


Couting xiangqi and shogi players as chess players is a bit unfair...


Sorry if I caused confusion, I didn't mean to count those as Chess- 
players. I just stated that to show that despite large population- 
numbers in say China, most of those people play Xiang-Qi rather  
than Wei-Qi.


This in contrast to a large country like Russia where I believe  
Chess is by far the most popular. In Holland however, Chess comes  
only at third place (or maybe even lower) after Bridge and Draughts.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rules for remote play at the Computer Olympiad

2009-04-03 Thread Vincent Diepeveen

Hi,

I see there has been some discussion in this list about cheating remote.
In computerchess this toleration has grown out of hand.

Setting the rules clear and sharp there in computer-go might avoid
for the future a lot of problems.

There is a very simple manner to avoid cheating in go.

But let me adress a few points first.

1) neural nets

forget about neural nets and cheating. A year or 12+ ago we had a lot  
of neural net guys in computerchess
as well. ANN's are not even close to representing the human mind, as  
modern insights in how brains work shows
clearly already for quite a while. Most important is that the  
automatic learning techniques of neural nets are so

total inefficient that it is really difficult to use them well.

Soon anything that is neural net will be beaten by the rest major  
league.


The only case i remember that was a tad more stubborn was basically  
someone who tried to fool the rest;
he bought source code from someone and sold that as the 'neural net  
optimized version' engine. Yet the original
programmer of that code (Joost Buijs), he assured me that this  
program definitely used his parameters and not
some neural net optimized parameters, as he could reproduce even  
every score of it.


So in that case it was not the neural net that was there, it just was  
getting used as a sales reason.


Yet the neural nets will get beaten major league. If not next year,  
then some years later.
You can't forever improve a product without good debugging methods of  
what it actually is doing.


A black box that is real clever and intelligent doesn't exist.

2) sure on paper it is really easy to cheat.

IT IS ALSO REALLY EASY TO CHEAT WITHOUT REMOTE MACHINES IN FACT.

Oh in computerchess we've seen it all. There is a certain species of  
persons on the planet, they are not in big numbers there,
who are capable of fooling in a professional manner other persons,  
James Bond is nothing compared to the sneaky manners they

get things done.

For sure a bunch of them will also try it in computer-go.

For these guys, considering how weak for the coming few years go  
computers will play, there is not a big difference between remote
machines and local machines. It's too easy to cheat for them.  
Communication to and from a program is too difficult to 100% monitor.


So to speak just keeping the mouse at a certain spot is already  
enough to cheat, or having someone a fewmeters away at his laptop

signal something over blue tooth or whatever to the playing machine.

All been done.

The only real simple manner of catching crooks is by having a good  
tournament director who will enforce in case of suspected
moves played, that an engine must reproduce the move it played, with  
some reasonable decent score.


Now some of you will argue loud in one choir: parallel search  
doesn't reproduce moves.
One move can make or break a game, yet those who cheat have the habit  
to cheat many moves a game
and for several games. So there should be many datapoints one  
complains about.


If a program basically cannot reproduce a move, at the discretion of  
the tournament leader who might want to see whether
a move in question has a very similar score to other alternatives (in  
which case of course you don't know which of the equal

scored moves or nearly equal scored moves can get played).

But the principle thing is reproduction of great moves.

If a program under no circumstance can reproduce a specific move and  
that for several occasions, then that's very clear proof of course.


That is a rule that should be introduced in computerchess also IMHO.

Note there is also the time constraint. And search depth constraint.

One always must have a logfile displaying which iterations or steps a  
program already has performed; if one searches in a
very selective manner, a rather easy form of cheating that is 'near  
undetectable', is when a program plays moves that it
normally spoken would not have found on that hardware, yet iterations  
deeper.


As in selective search you can assume that in a move sequence m0..mN  
that the moves 0..N represent the line that one needs
to see to find it, there will be of course selectively moves in that  
sequence that might have been reduced somehow.


So if one takes care that in the 'hashtable' such sequence already  
gets searched deeper, by manually enforcing that sequence,

then the program 'learns' itself from hashtable sooner that move.

Now in chess this is easier than go currently, as the search method  
used is reductions,

but it'll come in go also.

Really effective is giving in the 'mainlines' of your opponent to be  
searched fully by a number of cores.


Yet again the only way to really detect this is by forcing  
reproduction of the moves by the tournament director.
If a system can't reproduce enough of the great moves played, for  
whatever reason, bad luck.


For parallel systems that search total non-deterministic, there is  
also a simple lemma; 

Re: [computer-go] On Don Dailey's first chess program

2008-11-24 Thread Vincent Diepeveen

Heh Don,

Paranoia attempts to keep hackers away hacking your software :)

On hacking: i found the fritz5 protection the most genius protection  
ever.


You just had to modify 2 variables in an inifile to 'hack' it.

All hackers could do this, but the average user had no clue how to  
edit an inifile,
so all 'normal users' had to buy it if they didn't know how to copy  
CDroms that is.


So i bet their sales must've been real good, as there was nothing to  
'hack'.


Vincent

On Nov 22, 2008, at 7:40 PM, Don Dailey wrote:


On Sat, 2008-11-22 at 17:54 +0100, Ingo Althöfer wrote:

Hello Don,

thx for all your answers.
I think, I found a website where old programs
(from the 19_80s and early 90's) are listed:

http://www.septober.de/chess/index.htm#

There are also screenshots of RexChess
http://www.septober.de/chess/pics/9102.gif

and Colossus X (by Martin Bryant)
http://www.septober.de/chess/pics/9001.gif

I think, in those days Martin's programs were the
best-sold chess software in Europe. I heard, he even
bought a Ferrari from the money he earned with
Colossus.

**
Again my wish: When you speak or write about
software piracy in Europe (which really exists): Please,
do not throw all regions or countries or people in one
big pot. I have many friends in Europe who are
honestly paying for their games software. And some
of them (including me) are sensible when someone writes
(openly or between the lines) that software piracy is
a standard in Europe.


That was not my intent.  I think software piracy is a standard in all
countries.   It's pretty much accepted practice everywhere.

I booby trapped one palm program I wrote (not Ogo).  Sure enough,  
within
a day or two of me putting it up on palmgear someone I know  
discovered a
site where you could get it for free.   Someone thought they had  
broken
the copy protection.   The easy way to copy protect a palm program  
is to
associate the palm users name (which you identify your palm with)  
with a

hash key.The secret key you send them much match the hash of their
user name.

I did that, but it was a decoy.   I arranged so that there were  
routines
which checksumed the entire executable for a somewhat more  
sophisticated

test.   I did other things to obfuscate things including making the
obvious hacks look like they had succeeded and putting the test in
several places, but with different looking code.   The obvious hack
would work for a while,  long enough for the hacker to think he had
succeeded.   In short I wanted the hacker to get frustrated,  
thinking he
had succeeded at first, but eventually wondering if he had really  
found

all the traps.  He would never be 100% sure even if he got past the
first one, or second one.

I didn't do anything malicious, but if the program wasn't hacked
correctly it would crash the machine with a message proclaiming that
this was a hacked copy.   The message of course was encrypted and
hopefully difficult to find.   When the machine crashed, it would
require a reset of the machine.  Not just a reboot.

I took the hacked version and tested it to see if the hacker had
actually succeeded.   It looked like he had succeeded, but after  
getting

in and out of the program the required number of times, I got the
message!   Yeah!   I had embarrassed the hacker!

It's not possible of course to fool a determined hacker, no matter  
what

you do there is a way around it but my goal was to try to make it not
worth the hackers trouble.

- Don







Best regards, Ingo

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] programming languages

2008-10-10 Thread Vincent Diepeveen


On Oct 9, 2008, at 10:39 PM, Don Dailey wrote:


On Thu, 2008-10-09 at 15:20 -0400, [EMAIL PROTECTED] wrote:

Computers + random = can of worms.

What if I get a fast benchmark by implementing the fastest, most
awful, random number generator imaginable? What if every one of my
random playouts looks the disturbingly similar?

As to the relevance, the time-eaters in my heavy playouts are
different from the time-eaters in my light playouts.


This is true, but it goes back to the well understood fact that you
cannot have a perfect benchmark.

I think this would be very useful and very relevant,  just not  
perfect.


Random numbers IS an issue.  I required transparency so that any issue
like this can be observed, reviewed, criticized, etc.

However, there is a solution.  I don't think this solution is  
necessary,

but it is possible:  George Marsaglia published a set of very high
quality random number generators that can be implemented with just a
couple of lines of code.   We could specify that a specific random
number generator be used.  After all, this is only a benchmark.  But
these are fast generators.

The final step, which I don't personally support because I think it's
too restrictive but is also possible is to require these programs  
to be

deterministic.So you could SPECIFY the exact starting seed for a
random number generator as well as the specific method (or equivalent)
to be used to select moves randomly.

I don't really like that idea, but it makes it possible to perfectly
verify the implementations.   I don't like it simply because it places
additional constraints on creativity and flexibility.   You might come
up with a very clever and fast way to generate uniformly random moves
for instance that doesn't work within the framework.   Or you may want
to benchmark some language that doesn't like 32 bit data types  
(perhaps

31 bit types for instance.)   A lack of flexibility here would unduly
punish you.

So I'm happy to allow some leeway at the expense of perfect  
verification
as long as we have transparency,  anybody who wants can see the  
code and

criticize it or find flaws.   (His implementation is FAST, but he
cheated a little on the random numbers.)

It's also pretty easy to not pick moves uniformly random - I think  
some

on this group may have made that mistake.  This should show up if we
have good benchmarks.  Someone will notice something slightly off  
in the

statistics reported and it will be examined or seen to be flawed.


hi, i hate usually replies to my post that zoom in into 1 tiny  
detail, apologies for doing that.


However i'd like to zoom in on the uniformity of RNG's. Of course we  
all do effort to get uniform
RNG's. In fact the net is overloaded with them. Years ago i googled  
and i stumbled upon ranrot

from Agner Fog. http://www.agner.org/random/theory/

Like all fibonacci based RNG's it's completely uniform spreaded.

Now THAT has a few problems. When i did do some work a year or 8 ago  
for a guy called
Avery Cardoza, he released a game called Casino2000. Obviously it  
also has roulette also.


Now if we take the European roulette version that has a single zero  
(US casino's have 2 zero's,
a 0 and 00, which makes odds quite bigger for casino), there is on  
paper 1 strategy that might
deliver money. Might, because playing with a system is forbidden of  
course.


On paper there is 1 system that wins.

You put 1 fiche, then if you lose that bet you double, if you lose  
again you double again.

So you bet

1,2,4,8,16,32,64,128,256,512,1000

Hah, why 1000?

Because usually you won't get further than doubling 10 times.
Each table has a max bet.

So let's limit the number of bets to 8.

7 doublings in short.

Not a single RNG that is considered to be 'good' and 'well'. Yes not  
even Mersenne twister,
is *approaching* reality there that happens in casino's. Now i do not  
mean the 'bad' casino's,
but simply that the profit you make on paper is a lot higher than  
in reality (other than that they'll

kick you out of the casino).

The freakin nature of all those bitflippers is such that i'm  
*assuming* nowadays in some algorithms that they can produce
a field for me in O ( n log n ). So not only they are too uniform,  
they also can produce an entire field, covering every element

in O ( n log n), without a single failure.

Now THAT'S interesting :)


- Don




- Dave Hillis


-Original Message-
From: Don Dailey [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Thu, 9 Oct 2008 2:44 pm
Subject: Re: [computer-go] programming languages

On Thu, 2008-10-09 at 14:17 -0400, [EMAIL PROTECTED] wrote:

The http://shootout.alioth.debian.org/ site, ...

If we, as a community, could come up with a sufficiently detailed
description of light playouts algorithm (see the current Light
simulation : Characteristic values thread), there is no reason

that

this algorithm couldn't join them.



Many potential ways to speed up light playouts involve 

Re: [computer-go] programming languages

2008-10-10 Thread Vincent Diepeveen

Sure,

A lot faster is ranrot in 64 bits, at K8 2.2ghz with old GCC it is  
about 3.3 ns for each number,
so i assume it's quite a tad faster than that for core2. Note it's  
quite slow at itanium,
about 9-10 nanoseconds a number, as it appeared later itanium has no  
rotate hardware instructions :)


Here is how i rewrote it:

/* define parameters (R1 and R2 must be smaller than the integer  
size): */

#define KK  17
#define JJ  10
#define R1   5
#define R2   3
#define BITBOARD (unsigned long long)

/* global variables Ranrot */
BITBOARD randbuffer[KK+3] = { /* history buffer filled with some  
random numbers */
 0x92930cb295f24dab, 
0x0d2f2c860b685215,0x4ef7b8f8e76ccae7,0x03519154af3ec239,0x195e36fe715fa 
d23,
 0x86f2729c24a590ad,0x9ff2414a69e4b5ef, 
0x631205a6bf456141,0x6de386f196bc1b7b,0x5db2d651a7bdf825,
  
0x0d2f2c86c1de75b7,0x5f72ed908858a9c9,0xfb2629812da87693,0xf3088fedb657f 
9dd,0x00d47d10ffdc8a9f,
 0xd9e323088121da71,0x801600328b823ecb, 
0x93c300e4885d05f5,0x096d1f3b4e20cd47,0x43d64ed75a9ad5d9
 /*0xa05a7755512c0c03,0x960880d9ea857ccd,0x7d9c520a4cc1d30f, 
0x73b1eb7d8891a8a1,0x116e3fc3a6b7aadb*/

};
int r_p1, r_p2;  /* indexes into history buffer */

 / AgF  
1999-03-03 *
 *  Random Number generator 'RANROT' type  
B   *
 *  by Agner  
Fog  *
  
*
 *
 *  This is a lagged-Fibonacci type of random number generator  
with   *
 *  rotation of bits.  The algorithm  
is:  *
 *  X[n] = ((X[n-j] rotl r1) + (X[n-k] rotl r2)) modulo  
2^b   *
  
*
 *
 *  The last k values of X are stored in a circular buffer  
named  *
 *   
randbuffer.   *
  
*
 *
 *  This version works with any integer size: 16, 32, 64 bits  
etc.*
 *  The integers must be unsigned. The resolution depends on the  
integer  *
 *   
size. *
  
*
 *
 *  Note that the function RanrotAInit must be called before the  
first*
 *  call to RanrotA or  
iRanrotA   *
  
*
 *
 *  The theory of the RANROT type of generators is described  
at   *
 *  www.agner.org/random/ 
ranrot.htm   *
  
*
 *
  
 
*/


FORCEINLINE BITBOARD rotl(BITBOARD x,int r) {return(xr)|(x(64-r));}

/* returns a random number of 64 bits unsigned */
FORCEINLINE BITBOARD RanrotA(void) {
  /* generate next random number */
  BITBOARD x = randbuffer[r_p1] = rotl(randbuffer[r_p2],R1) + rotl 
(randbuffer[r_p1], R2);

  /* rotate list pointers */
  if( --r_p1  0)
r_p1 = KK - 1;
  if( --r_p2  0 )
r_p2 = KK - 1;
  return x;
}

/* this function initializes the random number generator.  */
void RanrotAInit(void) {
  int i;

  /* one can fill the randbuffer here with possible other values  
here */

  randbuffer[0] = 0x92930cb295f24000 | (BITBOARD)ProcessNumber;
  randbuffer[1] = 0x0d2f2c860b000215 | ((BITBOARD)ProcessNumber12);

  /* initialize pointers to circular buffer */
  r_p1 = 0;
  r_p2 = JJ;

  /* randomize */
  for( i = 0; i  300; i++ )
(void)RanrotA();
}

Please note there is faster RNG's than ranrot, with some shifting  
here and there.
But maybe ranrot is what is sufficient. You won't search enough nodes  
coming 10

years to get in trouble with it :)

Note that the parameter 'processnumber' is the process number of each  
process (or thread)

of search. Safely works up to 4096 cores :)

On Oct 10, 2008, at 12:36 AM, Don Dailey wrote:


On Thu, 2008-10-09 at 15:20 -0400, [EMAIL PROTECTED] wrote:

Computers + random = can of worms.


Has anyone seen this:

 http://home.southernct.edu/~pasqualonia1/ca/report.html#files

They are claiming impressive speed and high quality for a random  
number
generator.   The code is compact and small, much nicer than mt19937  
and

the speed seems to blow mt19937 out of the water.

I haven't looked at any papers on this and I'm wondering how good  
it is.


 Here is quote:


The cellular automaton outperforms the GSL random number
generators, being more than three times as fast as the GSL
generators.

The following table shows the mean time for 10 runs of each
generator, with each run producing 10 million integers. Source
code for both the GSL generators and the cellular 

Re: [computer-go] Super-duper computer

2008-09-29 Thread Vincent Diepeveen
That chessbrain is a commercial attempt of a few guys looking for  
money, not an attempt to really search parallel in a decent manner.


This is why they kept their logfiles of course all 'secret'.

I'm not the only one who offered my help to them, without payment,  
but that wasn't accepted at all.
In itself weird, they have little experience in parallellizing game  
tree search at many processors.


The software they use is getting used in an embarrassingly parallel  
manner.


So their speedup is real ugly bad of course.

A 8 core will easily totally outcalculate chessbrain with respect to  
search depth.

Seems to me the project died, bad shame in itself.

It is very complicated to get something to work that can play  
realtime. Long term analysis projects are more viable
and real interesting for several reasons. Yet what you see that most  
of those projects do is keep the parallel search frame
source code secret. Just publish an article claiming victory, without  
really data to statistically significant back that up.


That means in short, that no one can learn from them, nor objectively  
draw conclusoins.
A team that isn't doing supreme effort in getting a good speedup will  
of course not succeed.


Vincent


On Sep 28, 2008, at 2:15 PM, Claus Reinke wrote:


If you're looking for spare processors, how about a [EMAIL PROTECTED]
program for Go?-)


It appears that the Chess community has had such a project already:

ChessBrain: a Linux-Based Distributed Computing Experiment
http://www.linuxjournal.com/article/6929

ChessBrain II - A Hierarchical Infrastructure for Distributed
Inhomogeneous Speed-Critical Computation
IEEE CIG06, Reno NV, May 2006 (6 pages)
(IEEE Symposium on Computational Intelligence and Games)
http://chessbrain.net/docs/chessbrainII.pdf

old project site:
http://chessbrain.net/


From the chessbrainII paper, it seems they considered Go, but

before the recent developments that made parallel processing
promising. The papers might also be interesting for their discussion
of parallel tree search and communication issues.

Claus

Local versions of the top programs could offer to connect to their  
main incarnation's games,
explaining internal state (it is sure it will win, it thinks  
that group is dead, ..) in
exchange for borrowing processing resources. Or, instead of doing  
this on a per-program basis,
there could be a standard protocol for donating processing power  
from machines whose users view a

game online.

That way, the more kibitzes a game attracts, the better the computer
player plays; and if the game cannot hold an audience, the computer
player might start to seem distracted, losing all those borrowed
processors;-)

Mogo might even find some related research at INRIA ([EMAIL PROTECTED]
style (desktop) grid computing, ..), so perhaps there's scope for
collaboration there?

Claus

Q: why do you search for extra-terrestrial intelligence?
A: we've exhausted the local search space.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Analysis of 6x6 Go

2008-09-28 Thread Vincent Diepeveen

You guess also in go: side who begins wins game?

Vincent

On Sep 22, 2008, at 9:08 PM, Erik van der Werf wrote:

On Mon, Sep 22, 2008 at 7:14 PM, Ingo Althöfer 3-Hirn- 
[EMAIL PROTECTED] wrote:

 Does someone here know of other (documented) attempts
 to solve 6x6 Go?

 Didn't Erik van der Werf do it under his rules?

 He did it for 5x5-Go, see at
 http://erikvanderwerf.tengen.nl/5x5/5x5solved.html


Several 6x6 positions were solved, but not the empty board. E.g.,  
for the following position we could prove a Black win by at least 2  
points (which took about 13 days in 2002).


. . . . . .
. . . # O .
. . # O . .
. . # O . .
. . . # O .
. . . . . .


Optimal play on 6x6 under Chinese rules is expected to give a Black  
win by 4 points.


Erik


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Bobby Fischer

2008-09-14 Thread Vincent Diepeveen

Don,

When i play or analyze with a world top player, like top 10 of world,
and i do not get that chance a lot in my life, then i can really  
assure you,

from a social viewpoint seen, maybe YOU are far better than any of the
Russians i've ever played.

But from technical chess viewpoint seen: Opening, Middlegame, Endgame
and strategically and from accuracy viewpoint, they're a different  
type of level.


The average nerd here will not be able to imagine this. They're *so*  
much better.


A few months ago i had the honour of playing someone from India from  
the highest caste;
I concluded that he was better in every respect than me: Strategical,  
Tactical, Opening,

Memory (he remembered way way more), not to mention endgame.

Todays world top players, regardless whether you speak of a 9 dan go,
or the highest rated chessplayers, they are a league on their own,
where as a human race we can be only proud of having them.

Now it is because the games have a limited size of board and limited  
set of rules that

define the play, that with software we can or will beat them.

Yet you must realize very well that a strong chessplayer is also a  
strong go player
and that a strong go player is a strong chessplayer. Note this is NOT  
the case for checkers.


Go and chess are both SYMMETRIC games. Strategy and tactics are similar.
The indirect manner to win at chess and go is the same. In go it is  
also the endgoal: maximizing

your influence levels, in chess that's similar to maximizing mobility.

In 10x10 international checkers, the toughest and most widely played  
variation of the game,
not to confuse with the simplistic and nearly dead analyzed 8x8  
checkers, those checkers games
are asymmetric type of games. You cannot make a move to any part of  
the board, unlike with chess/go,
you can only move forward and never go back. That asymmetric nature  
of the game really is of overwhelming

importance. It totally changes the nature of the game.
Typically world champion level 10x10 checkers players are very poor  
chess players, 1600-1800 rated most are.

That's not even enough to play in national competition here.

My club has 4 teams (which is a lot) in national competition. Not a  
single of the titled checkers players are strong enough
to play there. On the other hand if a professional go player would  
knock on my door right now: heh i'm 5 dan professional go,
i like to play some chess this season, i would directly call the  
federation: heh we have a new player for our second team.


Within 1 day after learning the basic rules he/she can already play  
easily in first division. If you realize that this is just 1 grade  
underneath the
professional division (masterclass) and that there is 2 more grades  
(second class and third class) where we also have 2 teams,

you'll realize the huge difference.

Firstclass is 2100 - 2350+ , a few GM's (Jan Timman) and a few IM's  
play there as well.


If you want to know how strong a pro is, play him for money.

I know the double sides effect of the above statement, in chess the  
big problem

with those chess programs/computers has always
been that people got paid to lose.

This because most sponsors of matches always had more interest in  
winning a match than a fair competition,
or they set up the matches wrong. It is cheaper to hire a  
professional player for a lumpsum fee, as he doesn't need
to achieve a thing then and he can relax and do his best next day  
when playing in an all GM tournament.


The few matches we had where GM's got paid a lot more when winning,  
the computer has never ever in history won those
matches. Kramnik demanded 1 million dollar from a sheikh 'advance  
payment', Hydra team paid a lumpsum, Kasparov played
such beginners level against deep blue in the second match, just like  
he did do in the first match (which was enough to
win) about elo 2100 he played, that even today i'm not sure why.  
There is 2 different statements that reached me.
Several Russian GM's (some living in the west) swore to me that he  
has betted millions at the computer (could get good odds),
some other official who was there also during the match at the side  
of the Kasparov team (whose name i won't reveal,
as i forgot who it was), told me clearly there was a 'contractual'  
issue.


Personal i tend to believe the last explanation more than the first,  
deep blue was so outdated by 1997 from algorithmic
and evaluation viewpoint, that under no condition could they have  
played another match without looking like total beginners

compared to the PC-software programs.

They searched 10-12 ply with the thing, as logfiles also prove  
clearly. 2 years later, in the open hardware world championships
1999, the quad xeons (400-500Mhz) searched 14-17 ply handsdown and a  
lot more than that in endgame. I reached myself

at a quad xeon 20+ ply there in endgames where deep blue got 12.

Not a single of all those matches ever has been objective. Human  
players have 1 

Re: [computer-go] Lockless hash table and other parallel search ideas

2008-09-10 Thread Vincent Diepeveen



On Sep 10, 2008, at 2:12 PM, Don Dailey wrote:


The rules are exactly the same for 9x9 as for 19x19.  The boardsize is
different and that changes the game some.

I would suggest that if a top go player plays a game of chess
immediately after first learning the rules,   he would lose very badly
to even mediocre players or even advanced beginners.



Usually i hesitate commenting just upon 1 statement out of an entire
story. Additionally you have the advantage of the native English skill;
kind of Obama type statement that you can still define an 'advanced  
beginner'

as being a titled player who doesn't make money with it.

But we must correct you here in case you no longer see yourself as a  
beginner

or as an advanced beginner.

Directly after learning the games of
chess, a strong go player will be able to win from you.

Strategically and tactically they're that much above your level that  
they will

completely annihilate you.

Regarding that there are big differences between 9x9 and 19x19 i agree.

9x9 is a very simple game compared to 19x19.
From computer algorithms viewpoint seen that hard forward pruning is  
tougher to
do there; explaining why the current random searching methods do so  
bad in 9x9 and a lot better
at 19x19, relatively spoken. In absolute sense the will fail at both  
games of course.


The only proof the current random searching methods give in go is  
that searching deeply in random manner
is better than searching short lines in super-dubious manner. IMHO  
this is logical.


The fact that there is zero fulltime salary prospects to get clever  
guys interested in putting in years of fulltime effort into 9x9,
is the reason why the programs play this still so weak, as just a  
single one of them would raise the game quality a lot.


It is of course a combination of evaluation and search.

What you typically see is that players with little domain knowldge in  
chess nor go,

are busy with it now, doing a brute force attempt.

Note that computer-go has one big advantage over computer-chess;  
because there is little sales possible to
achieve, there is little money at stake, that gives the advantage  
that the programmers at least communicate
with each other in a forum like this and at tournaments. In  
computerchess it is very difficult to find talkative persons.


The main progress that happens in computerchess is simply by  
debugging other persons code.
It goes that far that a Russian author has even simply automatically  
converted the program Rybka 1.0 back

from assembler into C code, it is called strelka.

This communicative skills problem is why progress in computerchess  
goes relative slow. Still many brilliant guys
get lured into it, because chess gets played in 105+ nations. You see  
clearly now that they do not do much effort
for it nowadays, as just like computer-go there is nearly no money to  
make with an engine (in contradiction to GUI).


In doing that, it is selfexplaining that those who have a parallel go- 
program,
still didn't figure out what computerchess already knew in the 80s,  
namely that
to run parallel you need to avoid central locking, and need to search  
with

a global hashtable (Feldmann, somewhere in the 80s)
and a decentralized form of doing the search. That sure involves
locking in case you want a good speedup, something Leiersonco with  
CILK did not
manage nor Feldmann, but that is all very well solvable with a big  
effort.


Yet those big guys who were busy with chess in the past years, who  
already knew back in the 80s a lot
about decentralizing the parallel search, in computer-go they seem  
absent.


Vincent



I really doubt this would be the case with 9x9 go.  I don't think you
can really make a strong argument that 9x9 isn't go or that it's  
not the

same game.You CAN argue that the characteristics of the game are
different and different aspects of the game are emphasized.

Some really strong players may not be specialists in 9x9 and may  
lose to

players who specialize in 9x9 but are otherwise a few stones weaker at
19x19, but that's not remarkable.   In chess you can also be weakened
significantly and be thrown off your game by a surprise opening - or
we could imagine a game where your opening is decided for you and it
would make you very uncomfortable.

My guess (and it's only a guess) is that strong players playing on the
9x9 board are simply very uncomfortable but probably do not play as  
weak

as they imagine.In chess I heard that someone once did a study to
find out if playing speed chess weakened the performance of some  
players
more than others and despite the fact that many players imagine  
that it
does, it turned out that there was a remarkable correlation,  
although no
doubt some players who specialize at different time controls would  
have

an edge.





- Don



On Wed, 2008-09-10 at 11:27 +0900, Hideki Kato wrote:
Christoph Birk: Pine.LNX. 
[EMAIL PROTECTED]:

On Tue, 9 Sep 2008, Olivier Teytaud 

Re: [computer-go] Super-duper computer

2008-09-07 Thread Vincent Diepeveen

Yeah won't be long until we've got petaflop hardware in PC hardware.
Currently it is double precision floating point crunching power in  
CELL type hardware.


In short using hashtables is going to be very complicated on that  
type of hardware.
If you're not using that your efficiency is not so high  
(understatement).


You sure you can run on CELL cpu's?

Maybe it is time thinking of pondering on a more efficient form of  
search,

instead of add up on inefficiency?

Vincent

On Sep 7, 2008, at 4:40 AM, Darren Cook wrote:

This beast goes online in 2011.  Better start lobbying now for  
some Mogo

time.
http://www.networkworld.com/community/node/32152


By coincidence I was looking at the Top 500 list yesterday and the top
machine already does petaflop (peak) performance [1]. I wonder how  
many

playouts/second Mogo would do on that :-).

Darren

[1]:
http://www.top500.org/blog/2008/06/14/ 
preview_31st_top500_list_world_s_most_powerful_supercomputers_topped_w 
orld_s_first_petaflop_s_system


--
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://darrendev.blogspot.com/ (blog on php, flash, i18n, linux, ...)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] mogo beats pro!

2008-08-09 Thread Vincent Diepeveen

Congrats to the MoGo team for getting system time at SARA for a match.

The architecture of the power5/power6 system (2007 july a power5  
system was installed and that has been updated to power6 now),
is based upon having sufficient RAM and high bandwidth to i/o (for  
each Gflop a specific amount of i/o bandwidth is there).


When they opened the machine (when i stand in front of it at SARA),
you can see that a power6 is basically 1 big block of metal that eats  
a shitlot of power.
A single rack with a few powerblocks eats more power than the entire  
storage of it (which is a huge wall of harddrives).


Huijgens has 8 dual core processors, 16 cores a node in short, which  
are very bad for integers.

For most go/chess/checkers type applications a 4.7Ghz power6
single core performs like a 2.4Ghz core2. That is because it focuses  
upon a high number of instructions a cycle, basically floating point
(double precision). The power processor is not so great for most  
programs for integers.

Its IPC (instructions per cycle) is far under 1.0 for most programs.

For such type of hardware i therefore did do a PGO run usually with  
Diep that lasts for 24 hours.
After that the compiler software has more datapoints to optimize the  
software better. That gives a huge speedboost.


Each power6 block is connected with a highspeed network to the other  
nodes.
There is 2 physical network lines, i guess one for i/o and one for  
communication (RAM).


So the latency + bandwidth from 1 node to another is real bad  
compared to the huge crunching power of 1 node,
from RAM communication viewpoint seen. A blocked get will be between  
5 and 10 microseconds. That's not so fast.


From calculation viewpoint seen (instructions a cycle executed), the  
power6 is not so fast when you need more crunching power than 1
block. So for example for my chessprogram having 1 node would be  
interesting as that's a lot faster than a PC. I wouldn't ask for more  
than
1 node though; hashtable is too important for a program like Diep.  
Even though its nps is low compared to other chessprograms (it has  
the biggest evaluation function of all chessprograms) it still is  
doing a 200k nps a core * 16 cores = 3.2 million nps. That's 3.2  
million reads and
3.2 million writes to/from a shared global hashtable. Add to that  
that the network itself then also will deliver 6.4 million  
transactions a second
from the other nodes. So that's a grand total of 12.8 million.  
Suppose we say this is allowed to eat 10% of our system time,
then we would require a latency equal to handling fulltime 12.8 * 10  
= 128 million transactions a second or under 10 nanoseconds.


In reality however the network is factor 1000 slower in latency,  
demonstrating the latency problem to its full extend when one uses

more than 1 node for game tree search.

When using many processors (hundreds) using a global shared hashtable  
is extra important,
as it prematurely terminates by means of a hashtable cutoff millions  
of searches that are not needed to get done.


The necessity of a global shared hashtable was already  noticed  
clearly by Rainer Feldmann long time ago.


When not using a shared hashtable last 10 plies at 460 processors,  
the search depth loss is about 6 to 7 ply at tournament level time

controls for Diep.

The power6 is a magnicifent machine for software that can use big  
RAM. Power6 as in SARA has hundreds of gigabytes of RAM.


If your software can work on multiple of those nodes very well, one  
should consider buying for a $10k a big Tesla setup. It has some

thousands of cores clocked to 1.x Ghz.

You can get one to develop at already for just 520 euro.

The number of instructions a second Tesla can execute is a lot higher  
than power6 can do for the sequential game tree searching codes.


Tesla also gets handsdown an IPC of 1.0, higher than Power6. Let's  
use the IPC = 1.4 at core2 2.4Ghz,

that's giving power6 roughly ipc=0.75

Those 800 processors executing :
  800 * 0.75 * 4.7Ghz = 2.8T instructions per second.

that's quite a lot compared to a simple PC, which of course is a  
zillion times more efficient thanks to global shared hashtables.


A single new card from Nvidia when it is in tesla form is:
  240 cores * 1.2 Ghz = 0.36T instructions a cycle.

The tesla setups combine a bunch of those cards, giving a similar  
amount of instructions a cycle.


Assuming of course one isn't using double precision floating point  
for the software.
a new nvidia GPU has 60 processors onboard that can do double  
precision, not 240.


I'd estimate the search efficiency of mogo at the same efficiency  
level like Deep blue.


Deep Blue wasn't 11 teraflop. Not even close.
Its hardware was of course much faster than that from instructions  
per cycle seen.


It was having an average of 130 million nodes per second. Now for  
simplistico programs nowadays, A single node eats

roughly 1000 cycles at core2.

IPC = 1.4
1000 cycles a node
130 mln nps.

Re: [computer-go] Strength of Monte-Carlo w/ UCT...

2008-08-09 Thread Vincent Diepeveen


On Aug 9, 2008, at 9:45 PM, Don Dailey wrote:


I'm curious what you guys think about the scalability of monte carlo

with UCT.

The MCTS technique appears to be extremely scalable.  The theoretical
papers about it claim that it scales up to perfect play in theory.



We agree here that this is not true of course.

You can prove that random forms of search basically search based upon  
mobility.
The insight for that is real simple. If you control more area you  
have more possibilities,
so the statistical odds of that backtracking to the root is bigger  
than positions where you

control little area.

So a very selective searcher equipped with a good mobility function  
will beat it,
as its worst case is better. As you notice correctly at a certain  
time when programs get real
strong at certain areas, then the randomness of a search tree will  
backfire as it creates a

worst case in program vs program.

There is however 2 problems to solve to make it happen, maybe a 3d:
   a) making a good mobility evaluation function. Not easy in chess  
nor go.
i must admit in chess it took me until somewhere in  
2004-2005, so more than 11 years
   after starting chess, to make one. This as a good chessplayer  
who has put in a lot of time there.
In go where literature is even more inaccessable to  
programmers it will be harder even.


   b) selective searching is not so easy. How many is there who can  
search ultra selective
   and still improve bigtime in playstrength when given more cpu  
time? Well, other than Leela that is...


Historic note: we had randomness at chess too, if you just use  
mobility, nothing else, not even material values,

you soon end up above 2000 elo.

Go programs are far from that yet.

So until then you still will have to deal with the busloads of guys  
who claim neural networks

work very well as a replacement of search and/or evaluation.

It took until 1998 or so to find great well tuned parameters for  
material in chess (done by Chrilly Donninger,
note he posted publicly that the tuning didn't matter, just the  
patterns themselves; an interesting statement at the time),


Now the question is, what will happen when someone finds those for  
the most dominating positional components in computer-go?


Note Don that the obvious thing in go that a lot of humans who suck  
in go miss, is that finding the best
moves in a position as a subset of all total moves is a LOT easier  
than in chess.


So selective search in go is far easier to do well than in chess.  
It's relative easy to select each time just a move or 30 to 40
that are real relevant. If you look at it like that, then you can  
achieve easily far deeper selective search depths that are relevant

in go, than in chess, given the same hardware.

In chess you simply cannot afford to not look at the most stupid line  
possible, simply because sometimes sacraficing the entire board works.

The goal is just conquering the enemies king.

In go however if with some certain degree of sureness you have a won  
position, there is NEAR ZERO need to look further.

Hard pruning works far superior there.

Effectively go topprograms therefore get similar selective search  
depths to the search depths of chessprograms at the start of the game,
given the same hardware and time controls. This despite that  
chessprograms get far higher NPS-es.


When todays top chessprograms run on hardware from 1999, which was  
quad xeon 500Mhz that showed up in world champs,
at 3 minutes a move, they totally destroy with 100% scores the field  
back then. Not a single draw.


So we can argue a lot about search here. Considering the selective  
search depths todays go programs get i would argue that
the most important improvement now is evaluation function. That  
doesn't mean it needs to get implemented in the same manner,

nor that it needs to be some sort of huge function.

Considering the go strengths of the top authors in computer-go, one  
would argue they better can implement simple knowledge

and tune it very well.

That'll kick more butt than a small improvement in search.

People who holy believe in search depths as the absolute truth are  
interesting people.
If i play my todays Diep at hardware from 1999 against Fritz from  
those days, one of the
stronger programs at the time, Fritz would get 17 ply every move.  
Versus my Diep maybe 12.


Hell i'll even give 7 ply odds when needed. Just let me turn on a few  
extensions to not be tactical TOO weak.


It won't help Fritz from those days. It loses bigtime.

On paper 7 ply search depth extra is worth an elo or 700, so todays  
Diep should make 0% chance, just on paper.


In chess the center is dominant. Real soon everyone figured out that  
pieces and pawns in the center are worth more than
when not in the center. In Go all this is more complicated than  
chess. In todays top programs when they do not know what to do,
they just put their (light) pieces and pawns in the 

Re: [computer-go] 2008 World 9x9 Computer Go Championship in Taiwan

2008-07-02 Thread Vincent Diepeveen
From 1997 and onwards i managed to join in the computer chess world  
champs every year.


Besides the participants Stefan MK (Shredder), Shay Bushinsky and  
Amir Ban (both junior)
and tournament director Jaap van den Herik and Joke Hellemons who is  
doing the entire
organisation from ICGA side; besides that 'hard core' addicted  
computerchess guys and woman,
all of them excellent company also in restaurants, the other few who  
show up are there at most

just for a year or 3.

If we forgive sometimes that people cannot show up 1 or 2 events,  
then we can also add

Gerd Isenberg (who might be missing in Beijing) and David Levy.

Especially David is even better company in a restaurant,  sometimes  
he pays the bill.


Note that in ICGA events it is very easy to find the board (Jaap,  
David and Joke).


In Paderborn 1999 someone needed them, so we adviced the guy to look  
on the internet and search for the most expensive
hotel and most expensive restaurant; indeed in hotel Arosa the ICGA  
board had dinner.


I do not blame authors to not show up at events. Showing up is  
expensive, and most authors have many talents and get
asked for many different type of events. Most events you have to pay  
for.


Most of them only show up when their engine is 'in big shape'. You  
can be sure that the strongest engines also join an event.
An author knows pretty well when his engine makes a chance of doing  
well. The same will be true for computer-go.
The programmers who know they make a chance of winning, they all will  
be there for sure.


From the above elite group of ICGA members/participants that shows  
up a lot at the computerchess, at least 1 is not bribable.


Maybe that's why i am looking for a job now. Even having no sureness  
of income as of now,
it would not occur to me, not even at gunpoint, to recognize some  
other event as the official world championship,
when it is clear that the entire organisation was officially planned  
and scheduled for China.


It is not a problem when others organize events, in contradiction,  
when others also organize events that's even better.

It is nice to have many events, that makes a science more popular.

Yet do not forget that the real and official event is in Beijing.

That aside, every year, when the location is far away, it is tougher  
to participate;
the struggle to find a sponsor paying my bills is very tough.  
Sometimes there is years i succeed in that,

other years i pay it myself.

The ticket costs i have are far larger than the amount of money ICGA  
pays back. Add a hotel and extra travel time (as
the airplane to it needs like 10 hours or so, so you lose quickly 2  
days). Additionally being born in a nation where it hardly
ever gets hot, i cannot stand heat very well. I figured that out in  
Tel Aviv 2004.


In computerchess there is another cost, that most of you ignore.  
Hardware. Easy if you come from Japan, USA, Germany, UK or France.
I come from a tiny nation. There is no highclocked 8 core machines  
here that i can get from sponsors to test and play at in this tiny

nation of 16.5 million inhabitants.

Now that computer-go gets slowly more mature in this sense that  
engines start to beat slowly more human beings,
and not soon from now will beat 99% of all go-players, after which it  
will take endless until you can conclude objectively
that they are stronger. Of course for the average Joe on the street,  
he will conclude 10 years before it happens that the
machine is stronger, as the go world will figure out within a year or  
10 from now that the human side has a major weakness
that will cause it to lose 10 years sooner than it should: a human  
being is in big need of money and therefore lose for money.
This is partly because of arrogance. They do not prepare and play  
silly against the machine, usually several 10 kyu moves
get made during the game, to still let the game look interesting and  
especially to give the sponsor an interesting
result. This instead of beating it bigtime. The stronger the player,  
the more dumber social. It will be one of those players who will lose.
If the best player cannot get bribed, like Topalov refused a match  
under the conditions offered,
they'll skip and wait until a next world champion is there who is  
bribable (why do those always have a Russian passport?).


Even winning 10 matches after by human side, or losing games to tiny  
Grandmasters, that doesn't matter. The first match a world top
player loses will get all attention. The damage will be done and the  
average Ye on the street will consider the machine stronger.
Ye is always right; this is because Ye follows the first marketing  
principle. The first marketing principle says:

It is better to be the first than the best
(Al Ries  Jack Trout)

For a titled player like me, comparable to about 1-4 dan professional  
(i'm 4 dan only and only when a game is real important to my team,
in all others i'm at most 1), i needed a