Re: Does CVS work with unicode files?

John Macdonald Tue, 11 Jul 2000 14:52:10 -0700
Pavel Roskin wrote :
|| Hello!
|| 
|| On Tue, 11 Jul 2000, Guus Leeuw wrote:
|| 
|| > >  From: ccyf [mailto:[EMAIL PROTECTED]]
|| > >  If files contain non-ascii characters, does cvs commands like diff 
|| > >  still work?
|| > 
|| > Nope. There are two possible ways of dealing with these:
|| > 1. Just check them in as ASCII, and possibly get them corrupted...
|| >    as CVS doesn't understand UNICODE
|| > 2. cvs add -kb <unicode-file> them, so that CVS thinks they're binary
|| >    and leaves them alone (doesn't try to do *anything* with
|| >    the contents of the file.
|| 
|| I am by no means an expert in Unicode, but shouldn't UTF be of some help
|| here? I believe that the line endings in UTF are normal UNIX line endings.
|| UTF contains no characters that could confuse CVS.
|| 
|| It is important that CVS understands files as sets of lines, so that it
|| can make diffs and merge sources.

Unicode values are large integers, but they can be represented in a
number of different ways.  The most useful way for this purpose is
UTF8, which uses one or more 8-bit bytes for each Unicode character.
A Unicode character whose value is in the range 0 to 127 is stored
unchanged as a single byte and has the same meaning as the ASCII
character of the same value.  Larger Unicode values as stored in
multiple bytes - and each of those bytes is in the range 128 to 255,
so they will not be confused with normal ASCII values in the 0 to 127
range.

As long as CVS doesn't take any special notice of any values above
127, and as long as the Unicode is stored in UTF8, there should be no
problem.

BUT: I haven't tried this myself.  I'm speaking from theory, nor
direct experience.  CAVEAT EMPTOR

-- 
Sleep should not be used as a substitute    | John Macdonald
for high levels of caffeine  -- FPhlyer     |   [EMAIL PROTECTED]
Re: Does CVS work with unicode files?

Reply via email to