On Sat, Nov 13, 2010 at 10:53 PM, Damian Okrasa <dokr...@gmail.com> wrote:
> I removed the wchar_t completely, added some UTF-8  parsing functions.
> No support for combining, bidi, doublecolumn etc. Markus Kuhn's UTF-8
> stress test file is not working 100% correctly (the decoder works
> however, even when reading bytes one by one).
>

I noticed in canstou():

   329 /* use this if your buffer is less than UTF_SIZ, it returns 1
if you can decode
   330    UTF-8 otherwise return 0 */
   331 static int canstou(char *s, int b) {
   332  unsigned char c = *s;
   333  int n;
   334
   335  if (b < 1)
   336          return 0;
   337  else if (~c&B7)
   338          return 1;
   339  else if ((c&(B7|B6|B5)) == (B7|B6))
   340          n = 1;
   341  else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5))
   342          n = 2;
   343  else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4))
   344          n = 3;
   345  else
   346          return 1;

        |
        v this is never reached.
   347  for (--b,++s; n>0&&b>0; --n,--b,++s) {
   348          c = *s;
   349          if ((c&(B7|B6)) != B7)
   350                  break;
   351  }
   352  if (n > 0 && b == 0)
   353          return 0;
   354  else
   355          return 1;
   356 }


If the current function is correct, then it can be simplified to:


/* use this if your buffer is less than UTF_SIZ, it returns 1 if you can decode
   UTF-8 otherwise return 0 */
static int canstou(char *s, int b) {
        unsigned char c = *s;

        if (b < 1)
                return 0;
        else if (~c&B7)
                return 1;
        else if ((c&(B7|B6|B5)) == (B7|B6))
                return 1;
        else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5))
                return 2;
        else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4))
                return 3;
        return 1;
}

the (b < 1) check shouldnt probably be there either.

Offtopic and not specificly aimed at you:
I noticed the coding style of st is quite ugly. Lots of
non-descriptive variable names, recurring logic which could be grouped
in a function. Inconsistent. One can take an example to look at dwm
imo, it's pretty clean.

Kind regards,
Hiltjo

Reply via email to