On Mon, Jan 02, 2017 at 11:42:30AM -0800, David Fotland wrote: > I think the character set property just refers to the contents > of comments and similar fields. The sgf format itself is entirely > in the common characters in UTF-8 and US-ASCII. > There is no need to assume a character set before the property. > If you find the character set property in the root node, > it should apply to a root comment, even if it comes earlier > in the properties in the root node.
In order to recognize the end of a comment C[text], one has to recognize the closing ]. If the character set is multibyte, it may have characters that have ] as a second byte. (For example, the word Honinbo, spelled in Big5, contains a byte ']'.) If one escapes bytes in the middle of a character, the resulting file is no longer a text file, and corruption is the result. If one does not escape bytes in the middle of a character, one needs to be able to recognize the characters, i.e., know the character set. That is why the CA[] property needs to come before any non-ASCII text. Andries [Let me also bcc you directly - my previous reply did not make it to the list yet.] _______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
