Niclas Hedhman wrote:
On Monday 05 September 2005 14:43, Antonio Gallardo wrote:
Of course that I am aware that both codesets (Shift-JIS and ISO-8859-1) are
different UNICODE subset. This is same as you stated.
No. Pier doesn't mix the difference between Unicode (sequence of characters)
and the mapping of those characters to fixed or variable length encoded
bytestreams.
The fact that character 65 in Unicode is in many encodings mapped to the byte
value 65 is for convenience only, and that fact should be ignored.
Our SVN uses UTF-8 as the default charset (or encoding) or not?
Subversion uses binary data, and is agnostic to any encodings in the data (or
so they say). AFAIU, marking files as text only deals with the line endings
and how the diff mails are generated.
The --encoding argument applies to commit messages.
Paths, URLs/URIs has additional encoding requirements.
Correct.
And is also worth noting that SVN before 1.2 and CVS2SVN create a pretty
broken combination when the commit message in CVS used an encoding that
was not UTF-8.
As an example, try to get svn log of the apache repository and the svn
client will fail, because we have three commit messages in latin-1
placed, as binary, by cvs2svn into svn (and prior to 1.2 there was no
encoding validation checking in svn) that get moved into the XML file
that is passed between the svn server and client, which is using UTF-8
as the encoding.
I've asked infra@ to fix this, but being not really high priority (only
data archeologist like myself care about those things) it is unlikely to
get fixed.
Anyhow, I agree with Pier, we should *only* use ASCII and escape unicode
characters explicitly the \uxxxx way.
--
Stefano.