On Sun, 17 Apr 2005, Russell King wrote:
BTW, there appears to be "errors" in the history committed thus far.
I'm not sure where this came from though. Some of them could be
UTF8 vs ASCII issues, ....>
One thing which definitely needs to be considered is - what character
encoding are the comments to be stored as?

Linus Torvalds replied:
To git, it's just a byte stream, and you can have binary comments if you
want to. I personally would prefer to move towards UTF eventually, but I
really don't think it matters a whole lot as long as 99.9% of everything
we'd see there is still 7-bit ascii.

I would _heartily_ recommend moving towards UTF-8 as the internal charset for all comments. Alternatives are possible (e.g., recording the charset in the header), but they're incredibly messy. Even if you don't normally work in UTF-8, it's pretty easy to set most editors up to read & write UTF-8. Having the data stored as a constant charset eliminates a raft of error-prone code.

--- David A. Wheeler
