On Thu, 28 May 2020 16:45:20 +0200 Mattias Andrée <maand...@kth.se> wrote:
> Hi Laslo, > > On Thu, 28 May 2020 15:53:32 +0200 > Laslo Hunhold <d...@frign.de> wrote: > > > On Thu, 28 May 2020 13:48:18 +0200 > > Mattias Andrée <maand...@kth.se> wrote: > > > > Dear Mattias, > > > > > Looks good, and I especially like the simplification it brings for > > > using partially loaded strings. > > > > I'm glad to hear that. Thanks! > > > > > However, I have three minor comments: > > > > > > I preferred `lut[off].mask` over `(lut[off].upper - lut[off].lower)`. > > > It is clearer what it means, and storing the mask in `lut` doesn't > > > even increase its size since it is padded anyway because `mincp` is > > > (atleast on x86-64 and i386) aligned to 4 bytes. An alternative, > > > is to use `~lut[off].lower` which I think is clearer than > > > `(lut[off].upper - lut[off].lower)`, but again, I prefer > > > `lut[off].mask`. You could also write > > > *cp = s[0] - lut[off].lower; > > > I think this alternative is about as clear as using `lut[off].mask`. > > > > I was first vary of this way, because it would be problematic if s[0] < > > lut[off].lower, but because we check this beforehand this is possible. > > I'll note it and add it later. > > > > > In POSIX (but not Linux) `1 << 16` can be either 0, 1, or 2¹⁶, > > > since `1` is an `int` which minimum width is 16, not 32. Similarly, > > > `0x10FFFF` could overflow to 0xFFFF. > > > > So would you recommend an explicit cast to uint32_t, i.e. > > > > (uint32_t)1 << 16 > > > > to overcome this? > > Yes. Don't forget about 0x10FFFF, where you need UINT32_C(0x10FFFF). Casting doesn't work here unless you also add the L suffix. > > > > > > I think `(s[i] & 0xC0) != 0x80` is clearer than `!BETWEEN(s[i], 0x80, > > > 0xBF)`, but since you changed this I assume you disagree. > > > > I don't disagree either way. The comment I added above is sufficient in > > terms of readability. I'm not a big fan of micro-optimizations and > > prefer higher "readability". Both solutions are readable enough, given > > a proper comment, but I just went with the "BETWEEN"-approach as it is > > similar to how we check it earlier. > > > > With best regards > > > > Laslo > > > > PS: No need to CC me, I am subscribed to the list. :P > > > > Sorry, I pressing reply to all instead of reply to list > without really looking, I need to remove the former option. > > > Regards, > Mattias Andrée >