On 2020-03-27, Richard Ipsum <[email protected]> wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.
> ---
>  libutf/utf.c          | 12 ++++++++++++
>  libutf/utftorunestr.c | 12 ++++++++++++
>  paste.c               | 27 +++++++++++++++------------
>  utf.h                 |  4 +++-
>  4 files changed, 42 insertions(+), 13 deletions(-)
>
> diff --git a/libutf/utf.c b/libutf/utf.c
> index 897c5ef..fc78f29 100644
> --- a/libutf/utf.c
> +++ b/libutf/utf.c
> @@ -62,6 +62,18 @@ utfnlen(const char *s, size_t len)
>       return i;
>  }
>
> +size_t
> +utfmemlen(const char *s, size_t len)
> +{
> +     const char *p = s, *end = s + len;
> +     size_t i;
> +     Rune r;
> +
> +     for(i = 0; p < end; i++)
> +             p += charntorune(&r, p, end - p);
> +     return i;
> +}

It looks like charntorune can return 0 even if p < end if it
encounters a truncated UTF-8 sequence, which would cause this to
infinite loop.

I think something similar to utfntorunestr should work here.

> +
>  char *
>  utfrune(const char *s, Rune r)
>  {
> diff --git a/libutf/utftorunestr.c b/libutf/utftorunestr.c
> index 005fe8a..d350c77 100644
> --- a/libutf/utftorunestr.c
> +++ b/libutf/utftorunestr.c
> @@ -11,3 +11,15 @@ utftorunestr(const char *str, Rune *r)
>
>       return i;
>  }
> +
> +int
> +utfntorunestr(const char *str, size_t len, Rune *r)
> +{
> +     int i, n;
> +     const char *p = str, *end = str + len;
> +
> +     for(i = 0; (n = charntorune(&r[i], p, end - p)) && p < end; i++)
> +             p += n;

I don't think the `&& p < end` is necessary, since if p == end,
charntorune returns 0.

Also, I think this function should return size_t, not int. I pushed a
change to make utftorunestr return size_t as well.

> +
> +     return i;
> +}

Reply via email to