On 2020-03-09, Richard Ipsum <[email protected]> wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.

Hi Richard,

Sorry for the delay on the review. This mostly looks good. Just a few
questions/comments.


> diff --git a/libutf/utf.c b/libutf/utf.c
> index 897c5ef..cf46e57 100644
> --- a/libutf/utf.c
> +++ b/libutf/utf.c
> @@ -62,6 +62,18 @@ utfnlen(const char *s, size_t len)
>       return i;
>  }
>
> +size_t
> +utfmemlen(const char *s, size_t len)
> +{
> +     const char *p = s;
> +     size_t i;
> +     Rune r;
> +
> +     for(i = 0; p - s < len; i++)
> +             p += chartorune(&r, p);
> +     return i;
> +}
> +
>  char *
>  utfrune(const char *s, Rune r)
>  {
> diff --git a/libutf/utftorunestr.c b/libutf/utftorunestr.c
> index 005fe8a..5da9d5f 100644
> --- a/libutf/utftorunestr.c
> +++ b/libutf/utftorunestr.c
> @@ -11,3 +11,15 @@ utftorunestr(const char *str, Rune *r)
>
>       return i;
>  }
> +
> +int
> +utfntorunestr(const char *str, size_t len, Rune *r)
> +{
> +     int i, n;
> +     const char *p = str;
> +
> +     for(i = 0; (n = chartorune(&r[i], p)) && p - str < len; i++)
> +             p += n;
> +
> +     return i;
> +}

I have a slight concern here (and in utfmemlen) that if the string
ends with a partial UTF-8 sequence or len == 0, we may read past the
end of the buffer. Perhaps we should use charntorune here?

> diff --git a/libutil/unescape.c b/libutil/unescape.c
> index d8ed2a2..deca948 100644
> --- a/libutil/unescape.c
> +++ b/libutil/unescape.c
> @@ -21,7 +21,8 @@ unescape(char *s)
>               ['n'] = '\n',
>               ['r'] = '\r',
>               ['t'] = '\t',
> -             ['v'] = '\v'
> +             ['v'] = '\v',
> +             ['0'] = '\0'

I think this is not necessary. It should be handled by the octal
escape handling below.

Reply via email to