On Sun, Jul 16, 2006 at 01:49:14PM -0700, Zack Weinberg wrote: > On 7/14/06, Nathaniel Smith <[EMAIL PROTECTED]> wrote: > >> +// ??? Ensure use of UTF8 encoding internally, validate encoding here. > > > >^^ Hmm? > > I have gotten lost in the conversions and the wrappers, and cannot > tell what encoding (if any) can be relied upon at this point in the > code. The exclusion of characters 00-1f and 7f, but none in the 80-ff > range, makes me think it's supposed to be utf8 (it's clearly not a > fixed-width 16- or 32-bit encoding; if it were any single-byte 8859.n > encoding, we should also exclude 80-9f; any other variable-width > encoding that I know of requires rather more smarts to find bad > characters in...)
file_paths are always utf8 internally. > But if it _is_ guaranteed to be utf8 at this point, there are a number > of invalid byte sequences that we ought to be weeding out: notably ED > A0 xx .. ED BF xx and overlength encodings like E0 9F 80; unless we > have a guarantee from elsewhere that we're not going to get them. I > have code (from libcpp) that I can adapt to do this. See utf8_validate, and the call to it at the top of the file_path constructor. (utf8_validate is itself stolen from glib.) -- Nathaniel -- Sentience can be such a burden. _______________________________________________ Monotone-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monotone-devel
