I have a routine which parses text.
It seemed to function well, until recently, when I had to feed it 50
megs of text (48.3 million characters).
The data is Cr delimited, and each line of text is of variable length.
I am using the below mentioned truncate option, so each time the
source/original text is shorter.
it takes a LONG time to process.
the basic scheme is:
- Locate desired delimiter (1 or more characters) occurrence (1 or more
times)
- return text between either start of text, or previous delimiter and
final
- optionally truncate original text removing located text.
ex:
utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F"
The routine uses Substring, and Position to accomplish this task.
Does anyone have a "better" text parser?
--------
Follows my parsing code:
//Project Method: utl_parsestring
// $1 - text - to be searched
// $2 - integer - number of times to locate character
// $3 - string (optional ) - the character to search for (default =
Tab)
// $4 - pointer (optional) - pointer to initial string to allow
truncation
// (Destructive parsing)
//RETURNS - text - text found between occurence N and N-1(preceeding)
//instance of the seperator character indicated
//Ex: utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
// utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A"
// utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F"
// utl_ParseString("A,B,C,D,E,F"; 0; ",") -> ""
C_TEXT($0;$String;$1;$Return_Value)
C_LONGINT($wanted;$2;$i;$Found)
C_TEXT($Search;$3)
C_POINTER($4;$Truncate)
$String:=$1 //string/text to be searched
$Wanted:=$2 //the number of times to find the character in the
incomming string
If (Count parameters=2) //if this is looking just for tabs
$Search:=<>x_Tab
Else //assign passed string
$Search:=$3
End if
If (Count parameters=4) //we want to destructively parse the incomming
string
$Truncate:=$4 //pointer to value to truncate
End if
If ($Wanted>0) & ($String#"") //if the number wanted is > 0 find
instance
For ($i;1;$Wanted)
$Found:=utl_text_Position ($Search;$String) //locate next instance of
character
Case of
: ($i<$Wanted) & ($Found>0) //if the number of char wanted is not yet
reached
$String:=Substring($String;$Found+1)
: ($Wanted=$i) & ($Found>0) //instance found
$Return_Value:=Substring($String;1;$Found-1)
If (Count parameters=4) //truncation was asked for, remove the
returned string (and everyhting before it)
$Truncate->:=Substring($String;$Found+Length($Search)) //replace the
incomming string with the truncated version (found removed)
End if
: ($Found=0) //no more instances
$i:=$Wanted+1 //end loop
$Return_Value:=$String
If (Count parameters=4)
$Truncate->:="" //replace the incomming string with empty string
End if
End case
End for
Else //else # wanted <= zero return empty string
$Return_Value:=""
End if
$0:=$Return_Value
//
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:[email protected]
**********************************************************************