Both functions are functioning properly. Thank you Panagiotis
Στις 28-02-2006, ημέρα Τρι, και ώρα 17:48 +0100, ο/η Mattias Gaertner έγραψε: > On Tue, 28 Feb 2006 18:08:34 +0200 > "Panagiotis Sidiropoulos" <[EMAIL PROTECTED]> wrote: > > > For anyone interest on a UTF8Pos function, here is one as suggested by > > Vincent and Mattias: > > > > // Find position into a utf string > > function UTF8Pos( cSearcFor, cSearchInto: UTF8String ): integer; > > var > > nPos: integer; > > > > begin > > nPos := pos( cSearcFor, cSearchInto ); > > if nPos > 0 then Result := UTF8Length( copy( cSearchInto, 1, nPos > > - 1 ) ) > > > > else Result := 0; > > end; > > Better use PChar for speed. > I added the following two function to LCLProc: > > function UTF8Pos(const SearchForText, SearchInText: string): integer; > // returns the character index, where the SearchForText starts in > SearchInText > var > p: LongInt; > begin > p:=System.Pos(SearchForText,SearchInText); > if p>0 then > Result:=UTF8Length(PChar(SearchInText),p-1)+1 > else > Result:=0; > end; > > function UTF8Copy(const s: string; StartCharIndex, CharCount: integer): > string; > // returns substring > var > StartBytePos: PChar; > EndBytePos: PChar; > MaxBytes: PtrInt; > begin > StartBytePos:=UTF8CharStart(PChar(s),length(s),StartCharIndex-1); > if StartBytePos=nil then > Result:='' > else begin > MaxBytes:=PtrInt(PChar(s)+length(s)-StartBytePos); > EndBytePos:=UTF8CharStart(StartBytePos,MaxBytes,CharCount); > if EndBytePos=nil then > Result:=copy(s,StartBytePos-PChar(s)+1,MaxBytes) > else > Result:=copy(s,StartBytePos-PChar(s)+1,EndBytePos-StartBytePos); > end; > end; > > > Mattias > > > > > > > Now, I'm trying to write a UTF8Copy function to return a specific > > ammount of characters (not bytes) from a string. Here is what I've done > > till now. It does not work correctly. Do you think I'm in the right path > > or is there any other, smarter, way to do this? > > > > // Get a utf character at a specific position > > function UTF8Copy( cCopyFrom: UTF8String; nFromPosition, nNoOfChars: > > integer ): UTF8String; > > var > > i, > > nUTF8Len, > > nByteLen, > > nStart: integer; > > > > begin > > Result := ''; > > nUTF8Len := UTF8Length( cCopyFrom ); > > if nFromPosition > nUTF8Len then exit; > > > > nByteLen := Length( cCopyFrom ); > > nStart := 0; > > for i := 1 to nByteLen do begin > > > > if UTF8Length( copy( cCopyFrom, 1, i ) ) = nFromPosition then > > nStart := i + 1; > > if ( nStart > 0 ) and > > ( UTF8Length( copy( cCopyFrom, nStart, i ) ) = nNoOfChars ) > > then break; > > > > end; > > maybe better: > > var > pCopyFrom > pCopyFrom:=UTF8CharStart(PChar(cCopyFrom),length(cCopyFrom),nFromPosition-1 > ); > if > > > > Result := copy( cCopyFrom, nStart, i ); > > end; > > > > Panagiotis > > > > -----Original Message----- > > From: Mattias Gaertner [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, February 28, 2006 12:24 PM > > To: [email protected] > > Subject: Re: [lazarus] String functions on non latin text > > > > > > On Tue, 28 Feb 2006 09:57:09 +0200 > > "Panagiotis Sidiropoulos" <[EMAIL PROTECTED]> wrote: > > > > > >so if there is something wrong with the sample I thought it should > > > >be gtk2 and the only problem I found was the position returned > > > >mismatched visually the substring > > > > > > I tried to find a relation between results but there is no any kind of > > > > > pattern, for example, for the first character give 1, the second 3 and > > > > > 21st give 41. Visually mismatch is the problem, I need to rearrange > > > characters for indexing reasons and can't trace what character is what > > > > > into convertion table. > > > > Jesus is right. > > UTF8 is a multi byte character encoding. This means a character has a > > size varying between 1 to 4. To get the character position use: > > > > BytePos:=System.Pos(search,text); > > if BytePos>0 then > > CharPos:=UTF8Length(Pchar(text),BytePos-1) > > else > > CharPos:=0; > > > > > > Mattias > > > > > > > > > > I will try to update Lazarus and FPC, just to be sure. > > > > > > Panagiotis > > > > > > -----Original Message----- > > > From: Jesus Reyes [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, February 28, 2006 8:30 AM > > > To: [email protected] > > > Subject: Re: [lazarus] String functions on non latin text > > > > > > > > > > > > ----- Original Message ----- > > > From: "Mattias Gaertner" <[EMAIL PROTECTED]> > > > To: <[email protected]> > > > Sent: Monday, February 27, 2006 2:45 PM > > > Subject: Re: [lazarus] String functions on non latin text > > > > > > > > > > On Mon, 27 Feb 2006 13:41:13 -0600 (CST) > > > > Jesus Reyes <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > --- Panagiotis Sidiropoulos <[EMAIL PROTECTED]> escribió: > > > > > > > > > > > Please download sample project at: > > > > > > - www.magentadb.gr/ftp/pos-sample.zip > > > > > > > > > > > > Panagiotis > > > > > > > > > > > > > > > > result := Pos(UTF8Decode(SubStr), UTF8Decode(Str)); > > > > > > > > > > seems to work, I think Pos(UTF8String,UTF8String) is yet to be > > > > > implemented. > > > > > > > > It does not need to be implemented. One nice feature of UTF8 is, > > > > that > > > > you can find out the start of an UTF8 character without parsing the > > > > whole string. A simple substring search works with UTF8 and is > > > > unambiguous. > > > > > > I guess it would depend on the need for the pos function return value, > > > > > if some feedback should be made to the user about the position the > > > substring matched then current pos functions doesn't not return a > > > visually right position, I mean > > > counting characters form left to right, the correct position should > > be > > > 21 not 41. > > > > > > If the value is to be user with other string functions then the return > > > > > value is right. > > > > > > if the function is ever implemented I think it should be for something > > > > > like > > > pos(UTFString,UTFString) where UTFString should represent any UTF > > > Encoding in use. Unlikely? maybe :D > > > > > > > On the other hand: UTF8Decode will fail on some character sets, not > > > > fitting into 2byte characters. > > > > > > it seems to have support for at least 3 byte chars. I didn't test > > > tho.. > > > > > > > > > > > My guess, why a simple Pos does not work for Panagiotis, is a either > > > > > > a > > > > > > > FPC bug or a gtk1 bug with greek characters. > > > > > > > > > > I compiled the test first for gtk1 and results looked right to me, so > > > if there is something wrong with the sample I thought it should be > > > gtk2 and the only problem I found was the position returned mismatched > > > > > visually the substring > > > > > > > > > > > Mattias > > > > > > > > > > Jesus Reyes A. > > > > > > __________________________________________________ > > > Correo Yahoo! > > > Espacio para todos tus mensajes, antivirus y antispam ¡gratis! > > > Regístrate ya - http://correo.yahoo.com.mx/ > > > > > > _________________________________________________________________ > > > To unsubscribe: mail [EMAIL PROTECTED] with > > > "unsubscribe" as the Subject > > > archives at http://www.lazarus.freepascal.org/mailarchives > > > > > > _________________________________________________________________ > > > To unsubscribe: mail [EMAIL PROTECTED] with > > > "unsubscribe" as the Subject > > > archives at http://www.lazarus.freepascal.org/mailarchives > > > > _________________________________________________________________ > > To unsubscribe: mail [EMAIL PROTECTED] with > > "unsubscribe" as the Subject > > archives at http://www.lazarus.freepascal.org/mailarchives > > > > _________________________________________________________________ > > To unsubscribe: mail [EMAIL PROTECTED] with > > "unsubscribe" as the Subject > > archives at http://www.lazarus.freepascal.org/mailarchives > > _________________________________________________________________ > To unsubscribe: mail [EMAIL PROTECTED] with > "unsubscribe" as the Subject > archives at http://www.lazarus.freepascal.org/mailarchives > _________________________________________________________________ To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe" as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
