Marcin 'Qrczak' Kowalczyk wrote: > > What if something occur in the file and does not form a valid, say, > > UTF-16 sequence? > > It's clearly invalid in the specs, so there would be an error detected. > But '\0' characters are valid UTF-8, so the only reason to disallow them > could be laziness, and although I am lazy, I do care about my language > more :-) > > An example when they occur in what can be considered text: GNU find with > option -print0, usually consumed with xargs -0. They are used as > separators between filenames because they are guaranteed to not occur in > a filename.
OK. So you need PASCAL-style strings instead of null-terminated strings. In that case it is still easy to use the existing code for length-known string operations: GNU memmem, BMH searching, etc. > I do distinguish characters and bytes. I have separate types: > ... Good to know that. :-) Best regards, Wu Yongwei -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
