On Wed, May 30, 2007 at 02:39:47PM -0700, Paul Schauble wrote:
> Actually, no. If you just took the CR without taking the other byte of
> the 16 bit character, then the byte-to-character phase would be wrong
> for the following line. You have to actually handle the data as 16 bit
> characters.
> 
> That's why I'm wondering if the underlying Haskell system can handle 16
> bit characters. If not, then how could this change be made?
> 
> BTW, the usual problem with programs handling UTF-16 is the null
> characters contained within strings. This usually doesn't work out well
> unless the underlying language handles wide characters.

We don't use Haskell Char's for file data in darcs, it's just raw bytes.
And only two (well maybe a couple more, counting the marking of conflicts)
functions would be needed to deal with UTF-16, breaking into lines and
concating the lines together (linesPS and unlinesPS).  It'd still be pretty
easy to add support for UTF-16.  The hard work would all be in the options
(similar to binary handling) to allow users to specify which line-breaking
they want.

Null characters are no problem, as we don't use C strings.

Haskell does use 32 bit characters for its Char type, darcs just doesn't
use this type.  It's a waste to convert from 8 bit to 32 bit and back
again, as I'm sure you'd imagine.
-- 
David Roundy
Department of Physics
Oregon State University
_______________________________________________
darcs-devel mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-devel

Reply via email to