Re: v12+ parsing text

Alan Chan Wed, 16 Nov 2016 12:44:56 -0800

1) Position use starting position
2) Position use * if possible - huge performance difference
3) Never change size of your source or result during process - this is the 
major issue for the performance
4) If your library are being used with large source/result often, try use blob 
which would be very fast.


Alan Chan


4D iNug Technical <[email protected]> writes:
>I have a routine which parses text.
>It seemed to function well, until recently, when I had to feed it 50 
>megs of text (48.3 million characters).
>The data is Cr delimited, and each line of text is of variable length.
>
>I am using the below mentioned truncate option, so each time the 
>source/original text is shorter.
>
>
>it takes a LONG time to process.
>the basic scheme is:
>- Locate desired delimiter (1 or more characters) occurrence (1 or more 
>times)
>- return text between either start of text, or previous delimiter and 
>final
>- optionally truncate original text removing located text.
>
>ex:
>utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
>
>if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F"
>
>The routine uses Substring, and Position to accomplish this task.
>
>Does anyone have a "better" text parser?
>
>
>--------
>Follows my parsing code:
>  //Project Method:  utl_parsestring
>  // $1 - text - to be searched
>  // $2 - integer - number of times to locate character
>  // $3 - string (optional ) - the character to search for (default = 
>Tab)
>  // $4 - pointer (optional) - pointer to initial string to allow 
>truncation 
>  //         (Destructive parsing)
>
>  //RETURNS - text - text found between occurence N and N-1(preceeding)
>  //instance of the seperator character indicated
>  //Ex:  utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
>  //       utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A"
>  //       utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F"    
>  //       utl_ParseString("A,B,C,D,E,F"; 0; ",") -> ""    
>C_TEXT($0;$String;$1;$Return_Value)
>C_LONGINT($wanted;$2;$i;$Found)
>C_TEXT($Search;$3)
>C_POINTER($4;$Truncate)
>
>$String:=$1  //string/text to be searched
>$Wanted:=$2  //the number of times to find the character in the 
>incomming string
>
>If (Count parameters=2)  //if this is looking just for tabs
>$Search:=<>x_Tab
>Else   //assign passed string
>$Search:=$3
>End if 
>
>If (Count parameters=4)  //we want to destructively parse the incomming 
>string
>$Truncate:=$4  //pointer to value to truncate
>End if 
>
>If ($Wanted>0) & ($String#"")  //if the number wanted is > 0 find 
>instance
>
>For ($i;1;$Wanted)
>$Found:=utl_text_Position ($Search;$String)  //locate next instance of 
>character
>
>Case of 
>: ($i<$Wanted) & ($Found>0)  //if the number of char wanted is not yet 
>reached 
>$String:=Substring($String;$Found+1)
>
>: ($Wanted=$i) & ($Found>0)  //instance found
>$Return_Value:=Substring($String;1;$Found-1)
>
>If (Count parameters=4)  //truncation was asked for, remove the 
>returned string (and everyhting before it)
>$Truncate->:=Substring($String;$Found+Length($Search))  //replace the 
>incomming string with the truncated version (found removed)
>End if 
>
>: ($Found=0)  //no more instances
>$i:=$Wanted+1  //end loop        
>$Return_Value:=$String
>
>If (Count parameters=4)
>$Truncate->:=""  //replace the incomming string with empty string
>End if 
>End case 
>End for 
>Else   //else # wanted <= zero return empty string
>$Return_Value:=""
>End if 
>$0:=$Return_Value
>  //
>**********************************************************************
>4D Internet Users Group (4D iNUG)
>FAQ:  http://lists.4d.com/faqnug.html
>Archive:  http://lists.4d.com/archives.html
>Options: http://lists.4d.com/mailman/options/4d_tech
>Unsub:  mailto:[email protected]
>**********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: v12+ parsing text

Reply via email to