Re: v12+ parsing text

Chip Scheide Wed, 16 Nov 2016 14:04:27 -0800

ok - doing some testing and recoding.
I do not quite understand....

I wrote code to implement substring (see far below)
I use it in a parsing routine (see below) on a text block of 2.7 
million characters.
time to process the entire block : 36.5 sec.


I use the exact same code, using 4D's Substring command
Time to process the entire block : 129.8 sec.

Why is the (presumably) compiled C code SLOWER, then Interpreted 4D 
code?
by a factor of 4?


---------
Parsing Routine
(initialization code removed)
For ($i;1;$How_Many)
$Start_Loc:=l_Last_Position+1
$Found_Location:=utl_text_Position ($Find;$Source;$Start_Loc)

If ($Found_Location>0)  //found
l_Last_Position:=$Find_Length+$Found_Location-1
Else 
$i:=utl_Exit_Loop 
End if 
End for 

If ($i=MAXLONG)  //not found. or not found enough
$Return_Text:=utl_text_Faster_Substring ($Source;$Start_Loc)
Else   //found requested occurence count of Find
$Return_Text:=utl_text_Faster_Substring($Source;$Start_Loc;$Found_Location-1)
End if 
$0:=$Return_Text


----------------
 //Project Method: utl_text_Faster_Substring
  //$1 - text - source text to find substring
  //$2 - longint - Start Location
  //$3 - longint (optional) - Character count, 
  //   if not provided, or zero, return all beginging at $2

  //faster substring code

  // ∙ Created 11/16/16 by Chip - 
C_TEXT($1;$Source;$0;$Return_Text)
C_LONGINT($2;$Start_Location;$3;$Return_Length)

$Source:=$1
$Start_Location:=$2
$Source_Length:=Length($Source)

Case of 
: (Count parameters=2)
$Return_Length:=Length($Source)

: ($3=0)
$Return_Length:=Length($Source)
Else 
$Return_Length:=$3
End case 

For ($i;$Start_Location;$Return_Length)
$Current_Char:=$i+$Start_Location-1

If ($Current_Char<=$Source_Length)
$Return_Text:=$Return_Text+$Source≤$Current_Char≥
Else 
$i:=utl_Exit_Loop 
$Return_Text:=""
End if 
End for 
$0:=$Return_Text





On Wed, 16 Nov 2016 12:55:41 -0800, Douglas von Roeder wrote:
> Chip:
> 
> If you haven't grabbed a copy already
> <http://www.pluggers.nl/product/api-pack/>, API Pack has a few BLOB
> routines that you might find handy including API Find in Blob, API Replace
> in Blob.
> 
> --
> Douglas von Roeder
> 949-336-2902
> 
> On Wed, Nov 16, 2016 at 12:44 PM, Alan Chan <[email protected]> wrote:
> 
>> 1) Position use starting position
>> 2) Position use * if possible - huge performance difference
>> 3) Never change size of your source or result during process - this is the
>> major issue for the performance
>> 4) If your library are being used with large source/result often, try use
>> blob which would be very fast.
>> 
>> Alan Chan
>> 
>> 
>> 4D iNug Technical <[email protected]> writes:
>>> I have a routine which parses text.
>>> It seemed to function well, until recently, when I had to feed it 50
>>> megs of text (48.3 million characters).
>>> The data is Cr delimited, and each line of text is of variable length.
>>> 
>>> I am using the below mentioned truncate option, so each time the
>>> source/original text is shorter.
>>> 
>>> 
>>> it takes a LONG time to process.
>>> the basic scheme is:
>>> - Locate desired delimiter (1 or more characters) occurrence (1 or more
>>> times)
>>> - return text between either start of text, or previous delimiter and
>>> final
>>> - optionally truncate original text removing located text.
>>> 
>>> ex:
>>> utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
>>> 
>>> if truncating, the original text ("A,B,C,D,E,F") would become "D,E,F"
>>> 
>>> The routine uses Substring, and Position to accomplish this task.
>>> 
>>> Does anyone have a "better" text parser?
>>> 
>>> 
>>> --------
>>> Follows my parsing code:
>>>  //Project Method:  utl_parsestring
>>>  // $1 - text - to be searched
>>>  // $2 - integer - number of times to locate character
>>>  // $3 - string (optional ) - the character to search for (default =
>>> Tab)
>>>  // $4 - pointer (optional) - pointer to initial string to allow
>>> truncation
>>>  //         (Destructive parsing)
>>> 
>>>  //RETURNS - text - text found between occurence N and N-1(preceeding)
>>>  //instance of the seperator character indicated
>>>  //Ex:  utl_ParseString("A,B,C,D,E,F"; 3; ",") -> "C"
>>>  //       utl_ParseString("A,B,C,D,E,F"; 1; ",") -> "A"
>>>  //       utl_ParseString("A,B,C,D,E,F"; 6; ",") -> "F"
>>>  //       utl_ParseString("A,B,C,D,E,F"; 0; ",") -> ""
>>> C_TEXT($0;$String;$1;$Return_Value)
>>> C_LONGINT($wanted;$2;$i;$Found)
>>> C_TEXT($Search;$3)
>>> C_POINTER($4;$Truncate)
>>> 
>>> $String:=$1  //string/text to be searched
>>> $Wanted:=$2  //the number of times to find the character in the
>>> incomming string
>>> 
>>> If (Count parameters=2)  //if this is looking just for tabs
>>> $Search:=<>x_Tab
>>> Else   //assign passed string
>>> $Search:=$3
>>> End if
>>> 
>>> If (Count parameters=4)  //we want to destructively parse the incomming
>>> string
>>> $Truncate:=$4  //pointer to value to truncate
>>> End if
>>> 
>>> If ($Wanted>0) & ($String#"")  //if the number wanted is > 0 find
>>> instance
>>> 
>>> For ($i;1;$Wanted)
>>> $Found:=utl_text_Position ($Search;$String)  //locate next instance of
>>> character
>>> 
>>> Case of
>>> : ($i<$Wanted) & ($Found>0)  //if the number of char wanted is not yet
>>> reached
>>> $String:=Substring($String;$Found+1)
>>> 
>>> : ($Wanted=$i) & ($Found>0)  //instance found
>>> $Return_Value:=Substring($String;1;$Found-1)
>>> 
>>> If (Count parameters=4)  //truncation was asked for, remove the
>>> returned string (and everyhting before it)
>>> $Truncate->:=Substring($String;$Found+Length($Search))  //replace the
>>> incomming string with the truncated version (found removed)
>>> End if
>>> 
>>> : ($Found=0)  //no more instances
>>> $i:=$Wanted+1  //end loop
>>> $Return_Value:=$String
>>> 
>>> If (Count parameters=4)
>>> $Truncate->:=""  //replace the incomming string with empty string
>>> End if
>>> End case
>>> End for
>>> Else   //else # wanted <= zero return empty string
>>> $Return_Value:=""
>>> End if
>>> $0:=$Return_Value
>>>  //
>>> **********************************************************************
>>> 4D Internet Users Group (4D iNUG)
>>> FAQ:  http://lists.4d.com/faqnug.html
>>> Archive:  http://lists.4d.com/archives.html
>>> Options: http://lists.4d.com/mailman/options/4d_tech
>>> Unsub:  mailto:[email protected]
>>> **********************************************************************
>> 
>> **********************************************************************
>> 4D Internet Users Group (4D iNUG)
>> FAQ:  http://lists.4d.com/faqnug.html
>> Archive:  http://lists.4d.com/archives.html
>> Options: http://lists.4d.com/mailman/options/4d_tech
>> Unsub:  mailto:[email protected]
>> **********************************************************************
>> 
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[email protected]
> **********************************************************************
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: v12+ parsing text

Reply via email to