On 9/16/13, Hans-Peter Diettrich <drdiettri...@aol.com> wrote: > > Did you also test the simpler approach, replicating the pattern in one > loop? It's independent of endianness, and can boil down to a single > machine instruction (x86: REP MOVS).
It would be repeating either 2,3, or 4-bytes each time. How would you code that? Simplified version, should be Endian safe: function Utf8StringOfChar(AUtf8Char: Utf8String; N: Integer): Utf8String; var UCharLen, i: Integer; C1, C2, C3: Char; PC: PChar; begin Result := ''; if (N <= 0) or (Utf8Length(AUtf8Char) <> 1) then Exit; UCharLen := Length(AUtf8Char); Case UCharLen of 1: Result := StringOfChar(AUtf8Char[1], N); 2: begin SetLength(Result, 2 * N); System.FillWord(Result[1], N, PWord(Pointer(AUtf8Char))^); ; end; 3: begin SetLength(Result, 3 * N); C1 := AUtf8Char[1]; C2 := AUtf8Char[2]; C3 := AUtf8Char[3]; PC := PChar(Result); for i:=1 to N do begin PC^ := C1; inc(PC); PC^ := C2; inc(PC); PC^ := C3; inc(PC); end; end; 4: begin SetLength(Result, 4 * N); System.FillDWord(Result[1], N, PDWord(Pointer(AUtf8Char))^); end; else begin //In November 2003 UTF-8 was restricted by RFC 3629 to four bytes to match //the constraints of the UTF-16 character encoding. //http://en.wikipedia.org/wiki/UTF-8 Result := StringOfChar('?', N); end; end; end; Bart -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus