Hello Lazarus-List, Wednesday, March 3, 2010, 12:24:35 AM, you wrote:
RH> Pls check the function I used for check UTF8 string. Hope it helpful RH> function IsUTF8(UnknownStr:string):boolean; Well, there is a lot of UTF8 strings that do not pass your checks ;) If you remove low ascii control chars what happend with UTF8 control chars ? RH> var RH> i :Integer; RH> begin RH> if length(UnknownStr)=0 then exit(true); RH> i:=1; RH> while i<length(UnknownStr) do RH> begin RH> // ASCII RH> if (UnknownStr[i] = #$09) or RH> (UnknownStr[i] = #$0A) or RH> (UnknownStr[i] = #$0D) or RH> (UnknownStr[i] in [#$20..#$7E]) then RH> begin RH> inc(i); RH> continue; RH> end; RH> // non-overlong 2-byte RH> if (UnknownStr[i] in [#$C2..#$DF]) and RH> (UnknownStr[i+1] in [#$80..#$BF]) then RH> begin It should crashes here with strings like: var s: string; begin s:=$C2; IsUTF8(s); end; which is not valid UTF8. RH> // excluding surrogates RH> ((UnknownStr[i]=#$ED) and RH> (UnknownStr[i+1] in [#$80..#$9F]) and RH> (UnknownStr[i+2] in [#$80..#$BF])) then Surrogates are not UTF8 valid codepoints. -- Best regards, JoshyFun -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
