Both functions are functioning properly.
Thank you

Panagiotis

Στις 28-02-2006, ημέρα Τρι, και ώρα 17:48 +0100, ο/η Mattias Gaertner
έγραψε:
> On Tue, 28 Feb 2006 18:08:34 +0200
> "Panagiotis Sidiropoulos" <[EMAIL PROTECTED]> wrote:
> 
> > For anyone interest on a UTF8Pos function, here is one as suggested by
> > Vincent and Mattias:
> > 
> > // Find position into a utf string
> > function UTF8Pos( cSearcFor, cSearchInto: UTF8String ): integer;
> > var
> >    nPos: integer;
> >    
> > begin
> >      nPos := pos( cSearcFor, cSearchInto );
> >      if  nPos > 0 then Result := UTF8Length( copy( cSearchInto, 1, nPos
> > - 1 ) )
> >
> >      else Result := 0;
> > end;
> 
> Better use PChar for speed.
> I added the following two function to LCLProc:
> 
> function UTF8Pos(const SearchForText, SearchInText: string): integer;
> // returns the character index, where the SearchForText starts in
> SearchInText
> var
>   p: LongInt;
> begin
>   p:=System.Pos(SearchForText,SearchInText);
>   if p>0 then
>     Result:=UTF8Length(PChar(SearchInText),p-1)+1
>   else
>     Result:=0;
> end;
> 
> function UTF8Copy(const s: string; StartCharIndex, CharCount: integer):
> string;
> // returns substring
> var
>   StartBytePos: PChar;
>   EndBytePos: PChar;
>   MaxBytes: PtrInt;
> begin
>   StartBytePos:=UTF8CharStart(PChar(s),length(s),StartCharIndex-1);
>   if StartBytePos=nil then
>     Result:=''
>   else begin
>     MaxBytes:=PtrInt(PChar(s)+length(s)-StartBytePos);
>     EndBytePos:=UTF8CharStart(StartBytePos,MaxBytes,CharCount);
>     if EndBytePos=nil then
>       Result:=copy(s,StartBytePos-PChar(s)+1,MaxBytes)
>     else
>       Result:=copy(s,StartBytePos-PChar(s)+1,EndBytePos-StartBytePos);
>   end;
> end;
> 
> 
> Mattias
> 
> 
> 
> > 
> > Now, I'm trying to write a UTF8Copy function to return a specific
> > ammount of characters (not bytes) from a string. Here is what I've done
> > till now. It does not work correctly. Do you think I'm in the right path
> > or is there any other, smarter, way to do this?
> > 
> > // Get a utf character at a specific position
> > function UTF8Copy( cCopyFrom: UTF8String; nFromPosition, nNoOfChars:
> > integer ): UTF8String;
> > var
> >    i,
> >    nUTF8Len,
> >    nByteLen,
> >    nStart: integer;
> > 
> > begin
> >      Result := '';
> >      nUTF8Len := UTF8Length( cCopyFrom );
> >      if nFromPosition > nUTF8Len then exit;
> >      
> >      nByteLen := Length( cCopyFrom );
> >      nStart := 0;
> >      for i := 1 to nByteLen do begin
> > 
> >          if UTF8Length( copy( cCopyFrom, 1, i ) ) = nFromPosition then
> > nStart := i + 1;
> >          if ( nStart > 0 ) and
> >             ( UTF8Length( copy( cCopyFrom, nStart, i ) ) = nNoOfChars )
> > then break;
> > 
> >      end;
> 
> maybe better:
> 
> var 
>   pCopyFrom
> pCopyFrom:=UTF8CharStart(PChar(cCopyFrom),length(cCopyFrom),nFromPosition-1
> );
> if 
> 
> 
> >      Result := copy( cCopyFrom, nStart, i );
> > end;
> > 
> > Panagiotis
> > 
> > -----Original Message-----
> > From: Mattias Gaertner [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, February 28, 2006 12:24 PM
> > To: [email protected]
> > Subject: Re: [lazarus] String functions on non latin text
> > 
> > 
> > On Tue, 28 Feb 2006 09:57:09 +0200
> > "Panagiotis Sidiropoulos" <[EMAIL PROTECTED]> wrote:
> > 
> > > >so if there is something wrong with the sample I thought it should
> > > >be gtk2 and the only problem I found was the position returned 
> > > >mismatched visually the substring
> > > 
> > > I tried to find a relation between results but there is no any kind of
> > 
> > > pattern, for example, for the first character give 1, the second 3 and
> > 
> > > 21st give 41. Visually mismatch is the problem, I need to rearrange 
> > > characters for indexing reasons and can't trace what character is what
> > 
> > > into convertion table.
> > 
> > Jesus is right.
> > UTF8 is a multi byte character encoding. This means a character has a
> > size varying between 1 to 4. To get the character position use:
> > 
> > BytePos:=System.Pos(search,text);
> > if BytePos>0 then
> >   CharPos:=UTF8Length(Pchar(text),BytePos-1)
> > else
> >   CharPos:=0;
> > 
> >   
> > Mattias
> > 
> > 
> > > 
> > > I will try to update Lazarus and FPC, just to be sure.
> > > 
> > > Panagiotis
> > > 
> > > -----Original Message-----
> > > From: Jesus Reyes [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, February 28, 2006 8:30 AM
> > > To: [email protected]
> > > Subject: Re: [lazarus] String functions on non latin text
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > From: "Mattias Gaertner" <[EMAIL PROTECTED]>
> > > To: <[email protected]>
> > > Sent: Monday, February 27, 2006 2:45 PM
> > > Subject: Re: [lazarus] String functions on non latin text
> > > 
> > > 
> > > > On Mon, 27 Feb 2006 13:41:13 -0600 (CST)
> > > > Jesus Reyes <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > 
> > > > >  --- Panagiotis Sidiropoulos <[EMAIL PROTECTED]> escribió:
> > > > > 
> > > > > > Please download sample project at:
> > > > > > - www.magentadb.gr/ftp/pos-sample.zip
> > > > > > 
> > > > > > Panagiotis
> > > > > > 
> > > > > 
> > > > > result := Pos(UTF8Decode(SubStr), UTF8Decode(Str));
> > > > > 
> > > > > seems to work, I think Pos(UTF8String,UTF8String) is yet to be
> > > > > implemented.
> > > > 
> > > > It does not need to be implemented. One nice feature of UTF8 is, 
> > > > that
> > > > you can find out the start of an UTF8 character without parsing the 
> > > > whole string. A simple substring search works with UTF8 and is 
> > > > unambiguous.
> > > 
> > > I guess it would depend on the need for the pos function return value,
> > 
> > > if some  feedback should be made to the user about the position the 
> > > substring matched then current pos functions doesn't not return a 
> > > visually right position, I mean
> > > counting characters form  left to right, the correct position should
> > be
> > > 21 not 41.
> > > 
> > > If the value is to be user with other string functions then the return
> > 
> > > value is right.
> > > 
> > > if the function is ever implemented I think it should be for something
> > 
> > > like
> > > pos(UTFString,UTFString) where UTFString should represent any UTF 
> > > Encoding in use. Unlikely? maybe :D
> > > 
> > > > On the other hand: UTF8Decode will fail on some character sets, not
> > > > fitting into 2byte characters.
> > > 
> > > it seems to have support for at least 3 byte chars. I didn't test 
> > > tho..
> > > 
> > > > 
> > > > My guess, why a simple Pos does not work for Panagiotis, is a either
> > 
> > > > a
> > > 
> > > > FPC bug or a gtk1 bug with greek characters.
> > > > 
> > > 
> > > I compiled the test first for gtk1 and results looked right to me, so 
> > > if there is something wrong with the sample I thought it should be 
> > > gtk2 and the only problem I found was the position returned mismatched
> > 
> > > visually the substring
> > > 
> > > > 
> > > > Mattias
> > > > 
> > > 
> > > Jesus Reyes A.
> > > 
> > > __________________________________________________
> > > Correo Yahoo!
> > > Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
> > > Regístrate ya - http://correo.yahoo.com.mx/ 
> > > 
> > > _________________________________________________________________
> > >      To unsubscribe: mail [EMAIL PROTECTED] with
> > >                 "unsubscribe" as the Subject
> > >    archives at http://www.lazarus.freepascal.org/mailarchives
> > > 
> > > _________________________________________________________________
> > >      To unsubscribe: mail [EMAIL PROTECTED] with
> > >                 "unsubscribe" as the Subject
> > >    archives at http://www.lazarus.freepascal.org/mailarchives
> > 
> > _________________________________________________________________
> >      To unsubscribe: mail [EMAIL PROTECTED] with
> >                 "unsubscribe" as the Subject
> >    archives at http://www.lazarus.freepascal.org/mailarchives
> > 
> > _________________________________________________________________
> >      To unsubscribe: mail [EMAIL PROTECTED] with
> >                 "unsubscribe" as the Subject
> >    archives at http://www.lazarus.freepascal.org/mailarchives
> 
> _________________________________________________________________
>      To unsubscribe: mail [EMAIL PROTECTED] with
>                 "unsubscribe" as the Subject
>    archives at http://www.lazarus.freepascal.org/mailarchives
> 

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Reply via email to