1) Position use starting position 2) Position use * if possible - huge performance difference 3) Never change size of your source or result during process - this is the major issue for the performance 4) If your library are being used with large source/result often, try use blob which would be very fast.
Alan Chan 4D iNug Technical <[email protected]> writes: >I have a routine which parses text. >It seemed to function well, until recently, when I had to feed it 50 >megs of text (48.3 million characters). >The data is Cr delimited, and each line of text is of variable length. > >I am using the below mentioned truncate option, so each time the >source/original text is shorter. > > >it takes a LONG time to process. >the basic scheme is: >- Locate desired delimiter (1 or more characters) occurrence (1 or more >times) >- return text between either start of text, or previous delimiter and >final >- optionally truncate original text removing located text. > >ex: >utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C" > >if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F" > >The routine uses Substring, and Position to accomplish this task. > >Does anyone have a "better" text parser? > > >-------- >Follows my parsing code: > //Project Method: utl_parsestring > // $1 - text - to be searched > // $2 - integer - number of times to locate character > // $3 - string (optional ) - the character to search for (default = >Tab) > // $4 - pointer (optional) - pointer to initial string to allow >truncation > // (Destructive parsing) > > //RETURNS - text - text found between occurence N and N-1(preceeding) > //instance of the seperator character indicated > //Ex: utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C" > // utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A" > // utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F" > // utl_ParseString("A,B,C,D,E,F"; 0; ",") -> "" >C_TEXT($0;$String;$1;$Return_Value) >C_LONGINT($wanted;$2;$i;$Found) >C_TEXT($Search;$3) >C_POINTER($4;$Truncate) > >$String:=$1 //string/text to be searched >$Wanted:=$2 //the number of times to find the character in the >incomming string > >If (Count parameters=2) //if this is looking just for tabs >$Search:=<>x_Tab >Else //assign passed string >$Search:=$3 >End if > >If (Count parameters=4) //we want to destructively parse the incomming >string >$Truncate:=$4 //pointer to value to truncate >End if > >If ($Wanted>0) & ($String#"") //if the number wanted is > 0 find >instance > >For ($i;1;$Wanted) >$Found:=utl_text_Position ($Search;$String) //locate next instance of >character > >Case of >: ($i<$Wanted) & ($Found>0) //if the number of char wanted is not yet >reached >$String:=Substring($String;$Found+1) > >: ($Wanted=$i) & ($Found>0) //instance found >$Return_Value:=Substring($String;1;$Found-1) > >If (Count parameters=4) //truncation was asked for, remove the >returned string (and everyhting before it) >$Truncate->:=Substring($String;$Found+Length($Search)) //replace the >incomming string with the truncated version (found removed) >End if > >: ($Found=0) //no more instances >$i:=$Wanted+1 //end loop >$Return_Value:=$String > >If (Count parameters=4) >$Truncate->:="" //replace the incomming string with empty string >End if >End case >End for >Else //else # wanted <= zero return empty string >$Return_Value:="" >End if >$0:=$Return_Value > // >********************************************************************** >4D Internet Users Group (4D iNUG) >FAQ: http://lists.4d.com/faqnug.html >Archive: http://lists.4d.com/archives.html >Options: http://lists.4d.com/mailman/options/4d_tech >Unsub: mailto:[email protected] >********************************************************************** ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:[email protected] **********************************************************************

