I've done OK by this. It seems similar to the problem you are solving with
text. It's non-destructive, so that should help the speed. If it's really
fast, it's because of the * in Position
// ----------------------------------------------------
// Method: Parse_GetNthWordFromPos
// --- returns the nth word of a text string using Position
// --- useable in loop: Parse_GetNthWordFromPos ($text;1;->$startPos)
// INPUT1: Text - to search
// INPUT2: Longint - index number to fetch
// INPUT3: {Pointer} - Longint default=1, ->start search position, <-last
pos searched
// INPUT4: {Text} - default=space, word delimiter (could be char(13), ect...)
// OUTPUT: Text - retrieved word
// ----------------------------------------------------
C_TEXT($0;$1;$text)
C_LONGINT($2;$index;$loopcount)
C_POINTER($3)
C_LONGINT($start;$posFound;$wdLen)
C_TEXT($4;$wordDelim)
$text:=$1
$index:=$2
$loopcount:=0
If (Count parameters>2)
$start:=$3->
Else
$start:=1
$oldStart:=$start
End if
If (Count parameters>3)
$wordDelim:=$4
Else
$wordDelim:=" "
End if
$wdLen:=Length($wordDelim)
$maxed:=False
$mustEnd:=False
For ($i;1;$index)
$posFound:=Position($wordDelim;$text;$start;*)
$oldStart:=$start
If ($posFound#0)
$start:=$posFound+$wdLen //+1
Else
If ($i=$index)
$mustEnd:=True
End if
$i:=$index+1
$posFound:=Length($text)+$wdLen //+1
End if
End for
$0:=Substring($text;$oldStart;$posFound-$oldStart)
If (Count parameters>2)
$3->:=Choose($mustEnd;-1;$posFound+$wdLen) //+1)
End if
Keith - CDI
> On Nov 16, 2016, at 1:12 PM, Chip Scheide <[email protected]> wrote:
>
> I have a routine which parses text.
> It seemed to function well, until recently, when I had to feed it 50
> megs of text (48.3 million characters).
> The data is Cr delimited, and each line of text is of variable length.
>
> I am using the below mentioned truncate option, so each time the
> source/original text is shorter.
>
>
> it takes a LONG time to process.
> the basic scheme is:
> - Locate desired delimiter (1 or more characters) occurrence (1 or more
> times)
> - return text between either start of text, or previous delimiter and
> final
> - optionally truncate original text removing located text.
>
> ex:
> utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
>
> if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F"
>
> The routine uses Substring, and Position to accomplish this task.
>
> Does anyone have a "better" text parser?
>
>
> --------
> Follows my parsing code:
> //Project Method: utl_parsestring
> // $1 - text - to be searched
> // $2 - integer - number of times to locate character
> // $3 - string (optional ) - the character to search for (default =
> Tab)
> // $4 - pointer (optional) - pointer to initial string to allow
> truncation
> // (Destructive parsing)
>
> //RETURNS - text - text found between occurence N and N-1(preceeding)
> //instance of the seperator character indicated
> //Ex: utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
> // utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A"
> // utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F"
> // utl_ParseString("A,B,C,D,E,F"; 0; ",") -> ""
> C_TEXT($0;$String;$1;$Return_Value)
> C_LONGINT($wanted;$2;$i;$Found)
> C_TEXT($Search;$3)
> C_POINTER($4;$Truncate)
>
> $String:=$1 //string/text to be searched
> $Wanted:=$2 //the number of times to find the character in the
> incomming string
>
> If (Count parameters=2) //if this is looking just for tabs
> $Search:=<>x_Tab
> Else //assign passed string
> $Search:=$3
> End if
>
> If (Count parameters=4) //we want to destructively parse the incomming
> string
> $Truncate:=$4 //pointer to value to truncate
> End if
>
> If ($Wanted>0) & ($String#"") //if the number wanted is > 0 find
> instance
>
> For ($i;1;$Wanted)
> $Found:=utl_text_Position ($Search;$String) //locate next instance of
> character
>
> Case of
> : ($i<$Wanted) & ($Found>0) //if the number of char wanted is not yet
> reached
> $String:=Substring($String;$Found+1)
>
> : ($Wanted=$i) & ($Found>0) //instance found
> $Return_Value:=Substring($String;1;$Found-1)
>
> If (Count parameters=4) //truncation was asked for, remove the
> returned string (and everyhting before it)
> $Truncate->:=Substring($String;$Found+Length($Search)) //replace the
> incomming string with the truncated version (found removed)
> End if
>
> : ($Found=0) //no more instances
> $i:=$Wanted+1 //end loop
> $Return_Value:=$String
>
> If (Count parameters=4)
> $Truncate->:="" //replace the incomming string with empty string
> End if
> End case
> End for
> Else //else # wanted <= zero return empty string
> $Return_Value:=""
> End if
> $0:=$Return_Value
> //
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ: http://lists.4d.com/faqnug.html
> Archive: http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub: mailto:[email protected]
> **********************************************************************
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ: http://lists.4d.com/faqnug.html
Archive: http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:[email protected]
**********************************************************************