Re: v12+ parsing text

Keith Culotta Wed, 16 Nov 2016 14:03:17 -0800

I've done OK by this.  It seems similar to the problem you are solving with 
text.  It's non-destructive, so that should help the speed.  If it's really 
fast, it's because of the * in Position


  // ----------------------------------------------------
  // Method: Parse_GetNthWordFromPos
  // --- returns the nth word of a text string using Position
  // ---      useable in loop: Parse_GetNthWordFromPos ($text;1;->$startPos)
  // INPUT1: Text - to search
  // INPUT2: Longint - index number to fetch 
  // INPUT3: {Pointer} - Longint default=1,  ->start search position,  <-last 
pos searched 
  // INPUT4: {Text} - default=space, word delimiter (could be char(13), ect...)
  // OUTPUT:  Text - retrieved word
  // ----------------------------------------------------

C_TEXT($0;$1;$text)
C_LONGINT($2;$index;$loopcount)
C_POINTER($3)
C_LONGINT($start;$posFound;$wdLen)
C_TEXT($4;$wordDelim)

$text:=$1
$index:=$2
$loopcount:=0
If (Count parameters>2)
$start:=$3->
Else 
$start:=1
$oldStart:=$start
End if 

If (Count parameters>3)
$wordDelim:=$4
Else 
$wordDelim:=" "
End if 
$wdLen:=Length($wordDelim)

$maxed:=False
$mustEnd:=False

For ($i;1;$index)
$posFound:=Position($wordDelim;$text;$start;*)

$oldStart:=$start
If ($posFound#0)
$start:=$posFound+$wdLen  //+1
Else 
If ($i=$index)
$mustEnd:=True
End if 
$i:=$index+1
$posFound:=Length($text)+$wdLen  //+1
End if 

End for 
$0:=Substring($text;$oldStart;$posFound-$oldStart)


If (Count parameters>2)
$3->:=Choose($mustEnd;-1;$posFound+$wdLen)  //+1)
End if 


Keith - CDI

> On Nov 16, 2016, at 1:12 PM, Chip Scheide <[email protected]> wrote:
> 
> I have a routine which parses text.
> It seemed to function well, until recently, when I had to feed it 50 
> megs of text (48.3 million characters).
> The data is Cr delimited, and each line of text is of variable length.
> 
> I am using the below mentioned truncate option, so each time the 
> source/original text is shorter.
> 
> 
> it takes a LONG time to process.
> the basic scheme is:
> - Locate desired delimiter (1 or more characters) occurrence (1 or more 
> times)
> - return text between either start of text, or previous delimiter and 
> final
> - optionally truncate original text removing located text.
> 
> ex:
> utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
> 
> if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F"
> 
> The routine uses Substring, and Position to accomplish this task.
> 
> Does anyone have a "better" text parser?
> 
> 
> --------
> Follows my parsing code:
>  //Project Method:  utl_parsestring
>  // $1 - text - to be searched
>  // $2 - integer - number of times to locate character
>  // $3 - string (optional ) - the character to search for (default = 
> Tab)
>  // $4 - pointer (optional) - pointer to initial string to allow 
> truncation 
>  //         (Destructive parsing)
> 
>  //RETURNS - text - text found between occurence N and N-1(preceeding)
>  //instance of the seperator character indicated
>  //Ex:  utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
>  //       utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A"
>  //       utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F"    
>  //       utl_ParseString("A,B,C,D,E,F"; 0; ",") -> ""    
> C_TEXT($0;$String;$1;$Return_Value)
> C_LONGINT($wanted;$2;$i;$Found)
> C_TEXT($Search;$3)
> C_POINTER($4;$Truncate)
> 
> $String:=$1  //string/text to be searched
> $Wanted:=$2  //the number of times to find the character in the 
> incomming string
> 
> If (Count parameters=2)  //if this is looking just for tabs
> $Search:=<>x_Tab
> Else   //assign passed string
> $Search:=$3
> End if 
> 
> If (Count parameters=4)  //we want to destructively parse the incomming 
> string
> $Truncate:=$4  //pointer to value to truncate
> End if 
> 
> If ($Wanted>0) & ($String#"")  //if the number wanted is > 0 find 
> instance
> 
> For ($i;1;$Wanted)
> $Found:=utl_text_Position ($Search;$String)  //locate next instance of 
> character
> 
> Case of 
> : ($i<$Wanted) & ($Found>0)  //if the number of char wanted is not yet 
> reached 
> $String:=Substring($String;$Found+1)
> 
> : ($Wanted=$i) & ($Found>0)  //instance found
> $Return_Value:=Substring($String;1;$Found-1)
> 
> If (Count parameters=4)  //truncation was asked for, remove the 
> returned string (and everyhting before it)
> $Truncate->:=Substring($String;$Found+Length($Search))  //replace the 
> incomming string with the truncated version (found removed)
> End if 
> 
> : ($Found=0)  //no more instances
> $i:=$Wanted+1  //end loop        
> $Return_Value:=$String
> 
> If (Count parameters=4)
> $Truncate->:=""  //replace the incomming string with empty string
> End if 
> End case 
> End for 
> Else   //else # wanted <= zero return empty string
> $Return_Value:=""
> End if 
> $0:=$Return_Value
>  //
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[email protected]
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: v12+ parsing text

Reply via email to