ok - doing some testing and recoding. I do not quite understand.... I wrote code to implement substring (see far below) I use it in a parsing routine (see below) on a text block of 2.7 million characters. time to process the entire block : 36.5 sec.
I use the exact same code, using 4D's Substring command Time to process the entire block : 129.8 sec. Why is the (presumably) compiled C code SLOWER, then Interpreted 4D code? by a factor of 4? --------- Parsing Routine (initialization code removed) For ($i;1;$How_Many) $Start_Loc:=l_Last_Position+1 $Found_Location:=utl_text_Position ($Find;$Source;$Start_Loc) If ($Found_Location>0) //found l_Last_Position:=$Find_Length+$Found_Location-1 Else $i:=utl_Exit_Loop End if End for If ($i=MAXLONG) //not found. or not found enough $Return_Text:=utl_text_Faster_Substring ($Source;$Start_Loc) Else //found requested occurence count of Find $Return_Text:=utl_text_Faster_Substring($Source;$Start_Loc;$Found_Location-1) End if $0:=$Return_Text ---------------- //Project Method: utl_text_Faster_Substring //$1 - text - source text to find substring //$2 - longint - Start Location //$3 - longint (optional) - Character count, // if not provided, or zero, return all beginging at $2 //faster substring code // ∙ Created 11/16/16 by Chip - C_TEXT($1;$Source;$0;$Return_Text) C_LONGINT($2;$Start_Location;$3;$Return_Length) $Source:=$1 $Start_Location:=$2 $Source_Length:=Length($Source) Case of : (Count parameters=2) $Return_Length:=Length($Source) : ($3=0) $Return_Length:=Length($Source) Else $Return_Length:=$3 End case For ($i;$Start_Location;$Return_Length) $Current_Char:=$i+$Start_Location-1 If ($Current_Char<=$Source_Length) $Return_Text:=$Return_Text+$Source≤$Current_Char≥ Else $i:=utl_Exit_Loop $Return_Text:="" End if End for $0:=$Return_Text On Wed, 16 Nov 2016 12:55:41 -0800, Douglas von Roeder wrote: > Chip: > > If you haven't grabbed a copy already > <http://www.pluggers.nl/product/api-pack/>, API Pack has a few BLOB > routines that you might find handy including API Find in Blob, API Replace > in Blob. > > -- > Douglas von Roeder > 949-336-2902 > > On Wed, Nov 16, 2016 at 12:44 PM, Alan Chan <[email protected]> wrote: > >> 1) Position use starting position >> 2) Position use * if possible - huge performance difference >> 3) Never change size of your source or result during process - this is the >> major issue for the performance >> 4) If your library are being used with large source/result often, try use >> blob which would be very fast. >> >> Alan Chan >> >> >> 4D iNug Technical <[email protected]> writes: >>> I have a routine which parses text. >>> It seemed to function well, until recently, when I had to feed it 50 >>> megs of text (48.3 million characters). >>> The data is Cr delimited, and each line of text is of variable length. >>> >>> I am using the below mentioned truncate option, so each time the >>> source/original text is shorter. >>> >>> >>> it takes a LONG time to process. >>> the basic scheme is: >>> - Locate desired delimiter (1 or more characters) occurrence (1 or more >>> times) >>> - return text between either start of text, or previous delimiter and >>> final >>> - optionally truncate original text removing located text. >>> >>> ex: >>> utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C" >>> >>> if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F" >>> >>> The routine uses Substring, and Position to accomplish this task. >>> >>> Does anyone have a "better" text parser? >>> >>> >>> -------- >>> Follows my parsing code: >>> //Project Method: utl_parsestring >>> // $1 - text - to be searched >>> // $2 - integer - number of times to locate character >>> // $3 - string (optional ) - the character to search for (default = >>> Tab) >>> // $4 - pointer (optional) - pointer to initial string to allow >>> truncation >>> // (Destructive parsing) >>> >>> //RETURNS - text - text found between occurence N and N-1(preceeding) >>> //instance of the seperator character indicated >>> //Ex: utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C" >>> // utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A" >>> // utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F" >>> // utl_ParseString("A,B,C,D,E,F"; 0; ",") -> "" >>> C_TEXT($0;$String;$1;$Return_Value) >>> C_LONGINT($wanted;$2;$i;$Found) >>> C_TEXT($Search;$3) >>> C_POINTER($4;$Truncate) >>> >>> $String:=$1 //string/text to be searched >>> $Wanted:=$2 //the number of times to find the character in the >>> incomming string >>> >>> If (Count parameters=2) //if this is looking just for tabs >>> $Search:=<>x_Tab >>> Else //assign passed string >>> $Search:=$3 >>> End if >>> >>> If (Count parameters=4) //we want to destructively parse the incomming >>> string >>> $Truncate:=$4 //pointer to value to truncate >>> End if >>> >>> If ($Wanted>0) & ($String#"") //if the number wanted is > 0 find >>> instance >>> >>> For ($i;1;$Wanted) >>> $Found:=utl_text_Position ($Search;$String) //locate next instance of >>> character >>> >>> Case of >>> : ($i<$Wanted) & ($Found>0) //if the number of char wanted is not yet >>> reached >>> $String:=Substring($String;$Found+1) >>> >>> : ($Wanted=$i) & ($Found>0) //instance found >>> $Return_Value:=Substring($String;1;$Found-1) >>> >>> If (Count parameters=4) //truncation was asked for, remove the >>> returned string (and everyhting before it) >>> $Truncate->:=Substring($String;$Found+Length($Search)) //replace the >>> incomming string with the truncated version (found removed) >>> End if >>> >>> : ($Found=0) //no more instances >>> $i:=$Wanted+1 //end loop >>> $Return_Value:=$String >>> >>> If (Count parameters=4) >>> $Truncate->:="" //replace the incomming string with empty string >>> End if >>> End case >>> End for >>> Else //else # wanted <= zero return empty string >>> $Return_Value:="" >>> End if >>> $0:=$Return_Value >>> // >>> ********************************************************************** >>> 4D Internet Users Group (4D iNUG) >>> FAQ: http://lists.4d.com/faqnug.html >>> Archive: http://lists.4d.com/archives.html >>> Options: http://lists.4d.com/mailman/options/4d_tech >>> Unsub: mailto:[email protected] >>> ********************************************************************** >> >> ********************************************************************** >> 4D Internet Users Group (4D iNUG) >> FAQ: http://lists.4d.com/faqnug.html >> Archive: http://lists.4d.com/archives.html >> Options: http://lists.4d.com/mailman/options/4d_tech >> Unsub: mailto:[email protected] >> ********************************************************************** >> > ********************************************************************** > 4D Internet Users Group (4D iNUG) > FAQ: http://lists.4d.com/faqnug.html > Archive: http://lists.4d.com/archives.html > Options: http://lists.4d.com/mailman/options/4d_tech > Unsub: mailto:[email protected] > ********************************************************************** ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:[email protected] **********************************************************************

