[hackers] [libgrapheme] Make lg_utf8_*() NULL-agnostic || Laslo Hunhold

git Tue, 14 Dec 2021 06:13:11 -0800

commit 08b2c8e4e5222c04f3304595720d195a98ac7e8a
Author:     Laslo Hunhold <[email protected]>
AuthorDate: Tue Dec 14 14:06:23 2021 +0100
Commit:     Laslo Hunhold <[email protected]>
CommitDate: Tue Dec 14 14:06:23 2021 +0100


    Make lg_utf8_*() NULL-agnostic
    
    The special cases of NULL buffers and allocated zero-length buffers
    (malloc(0) does not necessarily return NULL!) can be gracefully
    handled:
    
      lg_grapheme_nextbreak(NULL) -> 0
      lg_grapheme_isbreak(cp1, cp2, NULL) -> run without state
      lg_utf8_decode(NULL, 0, &cp) -> 0, cp=invalid (we consumed nothing
                                                     and the cp is invalid)
      lg_utf8_encode(cp, NULL, 0) -> number of bytes needed (good for a
                                     dry-run!)
    
    While the lg_grapheme_*-functions already handled the cases well,
    this commit amends the lg_utf8_* functions to do it.
    
    Signed-off-by: Laslo Hunhold <[email protected]>

diff --git a/src/utf8.c b/src/utf8.c
index fe75eaa..b21c920 100644
--- a/src/utf8.c
+++ b/src/utf8.c
@@ -52,10 +52,10 @@ lg_utf8_decode(const uint8_t *s, size_t n, uint_least32_t 
*cp)
 {
        size_t off, i;
 
-       if (n == 0) {
+       if (s == NULL || n == 0) {
                /* a sequence must be at least 1 byte long */
                *cp = LG_CODEPOINT_INVALID;
-               return 1;
+               return 0;
        }
 
        /* identify sequence type with the first byte */
@@ -145,8 +145,12 @@ lg_utf8_encode(uint_least32_t cp, uint8_t *s, size_t n)
                        break;
                }
        }
-       if (1 + off > n) {
-               /* specified buffer is too small to store sequence */
+       if (1 + off > n || s == NULL || n == 0) {
+               /*
+                * specified buffer is too small to store sequence or
+                * the caller just wanted to know how many bytes the
+                * codepoint needs by passing a NULL-buffer.
+                */
                return 1 + off;
        }

[hackers] [libgrapheme] Make lg_utf8_*() NULL-agnostic || Laslo Hunhold

Reply via email to