> standard public fixed dataset of Go games, mainly to ease comparison of
> different methods, to make results more reproducible and maybe free the
> authors of the burden of composing a dataset. 

Maybe the first question should be is if people want a database of
*positions* or *games*.

I imagine a position database to be a set of board descriptions, with
each pro move marked on it. Ideally each move would say not just the
number of times it was chosen, but break it down by rank of player.

Each would have a zobrist hash calculated, in all 8 combinations, and
the lowest chosen. This handles rotations and duplicates. If there was
as a ko-illegal point on the board that needs to be stored, and also be
part of the zobrist hash.

A database of positions has some advantages:
  * No licensing issues (*)
  * Rotational duplicates already removed
  * Ready-to-go with the information (most) programs want to learn.

The advantages of storing games:
  * accountability/traceability
  * for programs who want to learn sequences of moves.


*: At least that was my conclusion when I looked into this before. Game
collections can be copyrighted; moves cannot. A database of moves can be
freely distributed, even it was generated from copyrighted game
collections, as long as there exists no way to regenerate the game
collection from it.

Text corpora (used in machine translation studies, for instance) follow
the same idea: if you split the corpora into sentences, then shuffle
them up randomly, you can distribute the set of sentences.

(I did wonder about storing player ranks, e.g. if a given position has a
move chosen by only a single 9p, and you can then extract each follow-up
position, you could extract a game. But, IMHO, you cannot regenerate any
particular game collection this way. If it is a concern, it can be
solved by only using a random 80% of moves from games.)

Computer-go mailing list

Reply via email to