Marcin 'Qrczak' Kowalczyk wrote:
> > What if something occur in the file and does not form a valid, say,
> > UTF-16 sequence?
>
> It's clearly invalid in the specs, so there would be an error detected.
> But '\0' characters are valid UTF-8, so the only reason to disallow them
> could be laziness, and although I am lazy, I do care about my language
> more :-)
>
> An example when they occur in what can be considered text: GNU find with
> option -print0, usually consumed with xargs -0. They are used as
> separators between filenames because they are guaranteed to not occur in
> a filename.

OK.  So you need PASCAL-style strings instead of null-terminated strings.
In that case it is still easy to use the existing code for length-known
string operations: GNU memmem, BMH searching, etc.

> I do distinguish characters and bytes. I have separate types:
> ...

Good to know that. :-)

Best regards,

Wu Yongwei
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to