On Sat, Nov 13, 2010 at 10:53 PM, Damian Okrasa <dokr...@gmail.com> wrote: > I removed the wchar_t completely, added some UTF-8 parsing functions. > No support for combining, bidi, doublecolumn etc. Markus Kuhn's UTF-8 > stress test file is not working 100% correctly (the decoder works > however, even when reading bytes one by one). >
I noticed in canstou(): 329 /* use this if your buffer is less than UTF_SIZ, it returns 1 if you can decode 330 UTF-8 otherwise return 0 */ 331 static int canstou(char *s, int b) { 332 unsigned char c = *s; 333 int n; 334 335 if (b < 1) 336 return 0; 337 else if (~c&B7) 338 return 1; 339 else if ((c&(B7|B6|B5)) == (B7|B6)) 340 n = 1; 341 else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5)) 342 n = 2; 343 else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4)) 344 n = 3; 345 else 346 return 1; | v this is never reached. 347 for (--b,++s; n>0&&b>0; --n,--b,++s) { 348 c = *s; 349 if ((c&(B7|B6)) != B7) 350 break; 351 } 352 if (n > 0 && b == 0) 353 return 0; 354 else 355 return 1; 356 } If the current function is correct, then it can be simplified to: /* use this if your buffer is less than UTF_SIZ, it returns 1 if you can decode UTF-8 otherwise return 0 */ static int canstou(char *s, int b) { unsigned char c = *s; if (b < 1) return 0; else if (~c&B7) return 1; else if ((c&(B7|B6|B5)) == (B7|B6)) return 1; else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5)) return 2; else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4)) return 3; return 1; } the (b < 1) check shouldnt probably be there either. Offtopic and not specificly aimed at you: I noticed the coding style of st is quite ugly. Lots of non-descriptive variable names, recurring logic which could be grouped in a function. Inconsistent. One can take an example to look at dwm imo, it's pretty clean. Kind regards, Hiltjo