On Wed, May 30, 2007 at 02:39:47PM -0700, Paul Schauble wrote: > Actually, no. If you just took the CR without taking the other byte of > the 16 bit character, then the byte-to-character phase would be wrong > for the following line. You have to actually handle the data as 16 bit > characters. > > That's why I'm wondering if the underlying Haskell system can handle 16 > bit characters. If not, then how could this change be made? > > BTW, the usual problem with programs handling UTF-16 is the null > characters contained within strings. This usually doesn't work out well > unless the underlying language handles wide characters.
We don't use Haskell Char's for file data in darcs, it's just raw bytes. And only two (well maybe a couple more, counting the marking of conflicts) functions would be needed to deal with UTF-16, breaking into lines and concating the lines together (linesPS and unlinesPS). It'd still be pretty easy to add support for UTF-16. The hard work would all be in the options (similar to binary handling) to allow users to specify which line-breaking they want. Null characters are no problem, as we don't use C strings. Haskell does use 32 bit characters for its Char type, darcs just doesn't use this type. It's a waste to convert from 8 bit to 32 bit and back again, as I'm sure you'd imagine. -- David Roundy Department of Physics Oregon State University _______________________________________________ darcs-devel mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-devel
