I have a routine which parses text.
It seemed to function well, until recently, when I had to feed it 50 
megs of text (48.3 million characters).
The data is Cr delimited, and each line of text is of variable length.

I am using the below mentioned truncate option, so each time the 
source/original text is shorter.


it takes a LONG time to process.
the basic scheme is:
- Locate desired delimiter (1 or more characters) occurrence (1 or more 
times)
- return text between either start of text, or previous delimiter and 
final
- optionally truncate original text removing located text.

ex:
utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"

if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F"

The routine uses Substring, and Position to accomplish this task.

Does anyone have a "better" text parser?


--------
Follows my parsing code:
  //Project Method:  utl_parsestring
  // $1 - text - to be searched
  // $2 - integer - number of times to locate character
  // $3 - string (optional ) - the character to search for (default = 
Tab)
  // $4 - pointer (optional) - pointer to initial string to allow 
truncation 
  //         (Destructive parsing)

  //RETURNS - text - text found between occurence N and N-1(preceeding)
  //instance of the seperator character indicated
  //Ex:  utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
  //       utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A"
  //       utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F"    
  //       utl_ParseString("A,B,C,D,E,F"; 0; ",") -> ""    
C_TEXT($0;$String;$1;$Return_Value)
C_LONGINT($wanted;$2;$i;$Found)
C_TEXT($Search;$3)
C_POINTER($4;$Truncate)

$String:=$1  //string/text to be searched
$Wanted:=$2  //the number of times to find the character in the 
incomming string

If (Count parameters=2)  //if this is looking just for tabs
$Search:=<>x_Tab
Else   //assign passed string
$Search:=$3
End if 

If (Count parameters=4)  //we want to destructively parse the incomming 
string
$Truncate:=$4  //pointer to value to truncate
End if 

If ($Wanted>0) & ($String#"")  //if the number wanted is > 0 find 
instance

For ($i;1;$Wanted)
$Found:=utl_text_Position ($Search;$String)  //locate next instance of 
character

Case of 
: ($i<$Wanted) & ($Found>0)  //if the number of char wanted is not yet 
reached 
$String:=Substring($String;$Found+1)

: ($Wanted=$i) & ($Found>0)  //instance found
$Return_Value:=Substring($String;1;$Found-1)

If (Count parameters=4)  //truncation was asked for, remove the 
returned string (and everyhting before it)
$Truncate->:=Substring($String;$Found+Length($Search))  //replace the 
incomming string with the truncated version (found removed)
End if 

: ($Found=0)  //no more instances
$i:=$Wanted+1  //end loop        
$Return_Value:=$String

If (Count parameters=4)
$Truncate->:=""  //replace the incomming string with empty string
End if 
End case 
End for 
Else   //else # wanted <= zero return empty string
$Return_Value:=""
End if 
$0:=$Return_Value
  //
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Reply via email to