Re: [RFC PATCH 6/6] utf8.c: avoid char overflow

Beat Bolli Mon, 09 Jul 2018 08:00:37 -0700

Hi Dscho

Am 09.07.2018 15:14, schrieb Johannes Schindelin:

Hi Beat,


On Sun, 8 Jul 2018, Beat Bolli wrote:

In ISO C, char constants must be in the range -128..127. Change theBOM

constants to unsigned char to avoid overflow.

Signed-off-by: Beat Bolli <[email protected]>
---
 utf8.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/utf8.c b/utf8.c
index d55e20c641..833ce00617 100644
--- a/utf8.c
+++ b/utf8.c

@@ -561,15 +561,15 @@ char *reencode_string_len(const char *in, intinsz,

 #endif

 static int has_bom_prefix(const char *data, size_t len,
-                         const char *bom, size_t bom_len)
+                         const unsigned char *bom, size_t bom_len)
 {

return data && bom && (len >= bom_len) && !memcmp(data, bom,bom_len);

 }

-static const char utf16_be_bom[] = {0xFE, 0xFF};
-static const char utf16_le_bom[] = {0xFF, 0xFE};
-static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
-static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+static const unsigned char utf16_be_bom[] = {0xFE, 0xFF};
+static const unsigned char utf16_le_bom[] = {0xFF, 0xFE};
+static const unsigned char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
+static const unsigned char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};


An alternative approach that might be easier to read (and avoids the

confusion arising from our use of (signed) chars for strings prettymuch

everywhere):

#define FE ((char)0xfe)
#define FF ((char)0xff)

...

I have tried this first (without the macros, though), and thought itlookedreally ugly. That's why I chose this solution. The usage is pretty localand

close to function has_bom_prefix().

Would an explaining comment help?

Beat

Re: [RFC PATCH 6/6] utf8.c: avoid char overflow

Reply via email to