Niclas Hedhman wrote:
On Monday 05 September 2005 14:43, Antonio Gallardo wrote:


Of course that I am aware that both codesets (Shift-JIS and ISO-8859-1) are
different UNICODE subset. This is same as you stated.


No. Pier doesn't mix the difference between Unicode (sequence of characters) and the mapping of those characters to fixed or variable length encoded bytestreams. The fact that character 65 in Unicode is in many encodings mapped to the byte value 65 is for convenience only, and that fact should be ignored.


Our SVN uses UTF-8 as the default charset (or encoding) or not?


Subversion uses binary data, and is agnostic to any encodings in the data (or so they say). AFAIU, marking files as text only deals with the line endings and how the diff mails are generated.
The --encoding argument applies to commit messages.
Paths, URLs/URIs has additional encoding requirements.

Correct.

And is also worth noting that SVN before 1.2 and CVS2SVN create a pretty broken combination when the commit message in CVS used an encoding that was not UTF-8.

As an example, try to get svn log of the apache repository and the svn client will fail, because we have three commit messages in latin-1 placed, as binary, by cvs2svn into svn (and prior to 1.2 there was no encoding validation checking in svn) that get moved into the XML file that is passed between the svn server and client, which is using UTF-8 as the encoding.

I've asked infra@ to fix this, but being not really high priority (only data archeologist like myself care about those things) it is unlikely to get fixed.

Anyhow, I agree with Pier, we should *only* use ASCII and escape unicode characters explicitly the \uxxxx way.

--
Stefano.

Reply via email to