Re: [Computer-go] Standard Computer Go Datasets - Proposal

Gonçalo Mendes Ferreira Fri, 13 Nov 2015 03:48:53 -0800

I think if you start calculating the Zobrist hashes and scrapingfeatures yourself you will have a neverending variety of datasets.

I would prefer datasets of whole, high quality games without SGF errors,perhaps cleaned of identifying information. Parsing an SGF is alreadytrivial. I personally divide them in:


- Handicap used or not
- Normal (5.5 - 7.5) or not komi, this disqualifies some older games
- Rules used
- Board size

Following the idea of having more information instead of very specificfeatures already extracted, it would be interesting to also have theplaying times, although I don't know where you'd get that from.

You'd be an angel if you could provide a large dataset of matches withChinese rules, specially in board sizes other than 19x19.

It would of course also have to be completely free for any use. Ipersonally only use the KGS 6d+ and a collection of 70k pro games that Idon't know where it came from. The GoGoD is proprietary. :)


Gonçalo F.

On 11/13/2015 08:39 AM, Josef Moudrik wrote:

Hello List,

There has been some debate in science about making the research more
reproducible and open. Recently, I have been thinking about making a
standard public fixed dataset of Go games, mainly to ease comparison of
different methods, to make results more reproducible and maybe free the
authors of the burden of composing a dataset. I think that the current
practice can be improved a lot.

Since the success of this endeavor crucially depends on how many authors
use the dataset, I would like to ask You (potential authors) a few
questions:

1) Would this be welcomed and used? Would You personally use it? (Am I not
reinventing the wheel?)

2) What parameters should the dataset have? The number of dataset variants
(if any) should be in my opinion kept at bare minimum to reduce
"fragmentation".

2a) Size: My current view is that at least 2 sizes are necessary: small
(1000-2000 games?) and large dataset (50000-60000 games).
2b) Strength & year span: Currently I am thinking about including modern
professional games only (1970-2015)

3) Do you have any other comments, requirements for the dataset and ideas?


Thanks for Your attention,
Kind regards
Josef Moudrik



_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Standard Computer Go Datasets - Proposal

Reply via email to