On 2020-03-09, Richard Ipsum <[email protected]> wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.
Hi Richard,
Sorry for the delay on the review. This mostly looks good. Just a few
questions/comments.
> diff --git a/libutf/utf.c b/libutf/utf.c
> index 897c5ef..cf46e57 100644
> --- a/libutf/utf.c
> +++ b/libutf/utf.c
> @@ -62,6 +62,18 @@ utfnlen(const char *s, size_t len)
> return i;
> }
>
> +size_t
> +utfmemlen(const char *s, size_t len)
> +{
> + const char *p = s;
> + size_t i;
> + Rune r;
> +
> + for(i = 0; p - s < len; i++)
> + p += chartorune(&r, p);
> + return i;
> +}
> +
> char *
> utfrune(const char *s, Rune r)
> {
> diff --git a/libutf/utftorunestr.c b/libutf/utftorunestr.c
> index 005fe8a..5da9d5f 100644
> --- a/libutf/utftorunestr.c
> +++ b/libutf/utftorunestr.c
> @@ -11,3 +11,15 @@ utftorunestr(const char *str, Rune *r)
>
> return i;
> }
> +
> +int
> +utfntorunestr(const char *str, size_t len, Rune *r)
> +{
> + int i, n;
> + const char *p = str;
> +
> + for(i = 0; (n = chartorune(&r[i], p)) && p - str < len; i++)
> + p += n;
> +
> + return i;
> +}
I have a slight concern here (and in utfmemlen) that if the string
ends with a partial UTF-8 sequence or len == 0, we may read past the
end of the buffer. Perhaps we should use charntorune here?
> diff --git a/libutil/unescape.c b/libutil/unescape.c
> index d8ed2a2..deca948 100644
> --- a/libutil/unescape.c
> +++ b/libutil/unescape.c
> @@ -21,7 +21,8 @@ unescape(char *s)
> ['n'] = '\n',
> ['r'] = '\r',
> ['t'] = '\t',
> - ['v'] = '\v'
> + ['v'] = '\v',
> + ['0'] = '\0'
I think this is not necessary. It should be handled by the octal
escape handling below.