On Mon, 3 Jul 2023 14:12:03 +0700 Hairy Pixels via fpc-pascal <fpc-pascal@lists.freepascal.org> wrote:
> > On Jul 3, 2023, at 2:04 PM, Tomas Hajny via fpc-pascal > > <fpc-pascal@lists.freepascal.org> wrote: > > > > No - in this case, the "header" is the highest bit of that byte > > being 0. > > Oh it's the header BIT. Admittedly I don't understand how this > function returns the highest bit using that case, which I think he > was suggesting. A first byte of an UTF-8 codepoint is 0..127,192..247. The second, third, fourth byte are between 128..191, so you can easily detect where a codepoint starts. And from the first byte you can derive the length of the codepoint. If you just want to skip over n codepoints, then the below function does the job: > function UTF8CodepointSizeFast(p: PChar): integer; > begin > case p^ of > #0..#191 : Result := 1; > #192..#223 : Result := 2; > #224..#239 : Result := 3; > #240..#247 : Result := 4; > else Result := 1; // An optimization + prevents compiler warning > about uninitialized Result. end; > end; Mattias _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal