> On Jul 3, 2023, at 11:43 AM, Mattias Gaertner via fpc-pascal > <fpc-pascal@lists.freepascal.org> wrote: > > There is a header byte. > > It depends, if you want to check for invalid UTF-8 sequences. > > From LazUTF8: > > function UTF8CodepointSizeFast(p: PChar): integer; > begin > case p^ of > #0..#191 : Result := 1; > #192..#223 : Result := 2; > #224..#239 : Result := 3; > #240..#247 : Result := 4; > else Result := 1; // An optimization + prevents compiler warning about > uninitialized Result. > end; > end; This is a header for the file? Does that mean the file itself must have uniform character sizes? I though the idea was to read the file one byte at a time but I don't understand how you would know if a 1 byte character (like ascii) was part of a 4 byte character or not. Regards, Ryan Joseph _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
- Re: [fpc-pascal] Parse unicode sc... Nikolay Nikolov via fpc-pascal
- Re: [fpc-pascal] Parse unicode scalar Hairy Pixels via fpc-pascal
- Re: [fpc-pascal] Parse unicode sc... Mattias Gaertner via fpc-pascal
- Re: [fpc-pascal] Parse unicod... Hairy Pixels via fpc-pascal
- Re: [fpc-pascal] Parse un... Mattias Gaertner via fpc-pascal
- Re: [fpc-pascal] Par... Hairy Pixels via fpc-pascal
- Re: [fpc-pascal]... Mattias Gaertner via fpc-pascal
- Re: [fpc-pascal] Parse unicode scalar Jer Haan via fpc-pascal
- Re: [fpc-pascal] Parse unicode scalar Hairy Pixels via fpc-pascal
- Re: [fpc-pascal] Parse unicode sc... Mattias Gaertner via fpc-pascal
- Re: [fpc-pascal] Parse unicod... Hairy Pixels via fpc-pascal
- Re: [fpc-pascal] Parse un... Mattias Gaertner via fpc-pascal
- Re: [fpc-pascal] Par... Hairy Pixels via fpc-pascal
- Re: [fpc-pascal]... Tomas Hajny via fpc-pascal
- Re: [fpc-pascal]... Hairy Pixels via fpc-pascal
- Re: [fpc-pascal]... Tomas Hajny via fpc-pascal
- Re: [fpc-pascal]... Mattias Gaertner via fpc-pascal