> standard public fixed dataset of Go games, mainly to ease comparison of > different methods, to make results more reproducible and maybe free the > authors of the burden of composing a dataset.
Maybe the first question should be is if people want a database of *positions* or *games*. I imagine a position database to be a set of board descriptions, with each pro move marked on it. Ideally each move would say not just the number of times it was chosen, but break it down by rank of player. Each would have a zobrist hash calculated, in all 8 combinations, and the lowest chosen. This handles rotations and duplicates. If there was as a ko-illegal point on the board that needs to be stored, and also be part of the zobrist hash. A database of positions has some advantages: * No licensing issues (*) * Rotational duplicates already removed * Ready-to-go with the information (most) programs want to learn. The advantages of storing games: * accountability/traceability * for programs who want to learn sequences of moves. Darren *: At least that was my conclusion when I looked into this before. Game collections can be copyrighted; moves cannot. A database of moves can be freely distributed, even it was generated from copyrighted game collections, as long as there exists no way to regenerate the game collection from it. Text corpora (used in machine translation studies, for instance) follow the same idea: if you split the corpora into sentences, then shuffle them up randomly, you can distribute the set of sentences. (I did wonder about storing player ranks, e.g. if a given position has a move chosen by only a single 9p, and you can then extract each follow-up position, you could extract a game. But, IMHO, you cannot regenerate any particular game collection this way. If it is a concern, it can be solved by only using a random 80% of moves from games.) _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go