Re: v12+ parsing text [summary]

Chip Scheide Thu, 17 Nov 2016 14:23:08 -0800

for posterity  :)

My new parsing routine (followed by new substring routine)


  //Project Method: utl_Text_Fast_Parse
  //$1 - pointer - to text to parse
  //$2 - longint (optional) - number of times to find character(s) 
(default is one)
  //$3 - text (optional) - text delimter to find (default is tab)

  //rewritten utl_text_ParseString

  //l_Last_Position is a 'pointer' to last character in source text 
that was found
  //in any previous call to this method - on the same source text 

  //NOTE : MUST call utl_Text_Fast_Parse_Init first

  //Ex:  utl_Text_Fast_Parse("A,B,C,D,E,F"; 3; ",") -> "C"
  //       utl_Text_Fast_Parse("A,B,C,D,E,F"; 1; ",") -> "A"
  //       utl_Text_Fast_Parse("A,B,C,D,E,F"; 6; ",") -> "F"    

  //RETURNS - text - text located between the last occurence and the 
most
  //    recent occurence of the Find text, or text between last 
occurence, and end of source
  //    if the FInd text is not located
  // ∙ Created 11/16/16 by Chip - 
C_LONGINT(l_Last_Position;$Start_Loc;$Current_Loc;$2;$How_Many;$Find_Length)
C_LONGINT($i;$Found_Location)
C_TEXT($3;$Find;$0;$Return_Text)
C_POINTER($1)  //for compatability with old utl_text_ParseString
C_BOOLEAN($Truncate)

$Source:=$1

Case of 
: (Count parameters=1)
$Find:=<>x_Tab
$How_Many:=1

: (Count parameters=2)  //2 parameters
$How_Many:=$2
$Find:=<>x_Tab

: (Count parameters>=3) & ($3#"")  //3 parameters and not blank
$Find:=$3
$How_Many:=$2

: (Count parameters>=3)  //3 parameters and blank
$How_Many:=$2
$Find:=<>x_Tab
End case 
$Find_Length:=Length($Find)

For ($i;1;$How_Many)  //for however many delimeters requested
$Start_Loc:=l_Last_Position+1  //start at the next character after last 
iteration
$Found_Location:=utl_text_Position ($Find;$Source->;$Start_Loc)

If ($Found_Location>0)  //found
l_Last_Position:=$Find_Length+$Found_Location-1
Else   //does not exist
$i:=utl_Exit_Loop 
End if 
End for 

If ($i=MAXLONG)  //not found. or not found enough times
$Return_Text:=utl_text_Faster_Substring ($Source->;$Start_Loc)
Else   //found requested occurence count of Find
$Return_Text:=utl_text_Faster_Substring 
($Source->;$Start_Loc;l_Last_Position-$Start_Loc)
End if 
$0:=$Return_Text
  //End utl_Text_Fast_Parse
-------------------

  //Project Method: utl_text_Faster_Substring
  //$1 - pointer - to text source text to find substring
  //$2 - longint - Start Location
  //$3 - longint (optional) - Character count, 
  //   if not provided, or zero, return all beginning at $2

  //faster substring code

  // ∙ Created 11/16/16 by Chip - 
C_POINTER($1;$Source)
C_TEXT($0;$Return_Text)
C_LONGINT($2;$Start_Location;$3;$Return_Length)
C_LONGINT($i;$Source_Length;$Current_Char)

$Source:=$1
$Start_Location:=$2
$Source_Length:=Length($Source->)

Case of 
: (Count parameters=2)
$Return_Length:=Length($Source->)

: ($3=0)
$Return_Length:=Length($Source->)
Else 
$Return_Length:=$3
End case 

Case of 
  //these values need to be tweeked, as they are just guesses
  //but looping over the characters *IS* faster then substring -
  //for some lengths these values worked well as a starting point
: (($Return_Length<=30) & (Not(Is compiled mode))) | \
((Is compiled mode) & ($Return_Length<=130))

For ($i;1;$Return_Length)
$Current_Char:=$i+$Start_Location-1

If ($Current_Char<=$Source_Length)
$Return_Text:=$Return_Text+$Source->≤$Current_Char≥
Else 
$i:=utl_Exit_Loop 
$Return_Text:=""
End if 
End for 
Else   //long return length use substring - faster
$Return_Text:=Substring($Source->;$Start_Location;$Return_Length)
End case 
$0:=$Return_Text
  //End utl_text_Faster_Substring



On Thu, 17 Nov 2016 13:41:48 -0800, Douglas von Roeder wrote:
> Chip:
> 
> Nice recap.
> 
> I'm interested in understanding the difference between passing a pointer
> and dereferencing the pointer during the operation versus passing a
> pointer, working on a local, and then doing Copy
> array($localTextArr_AT;t$arrayPtr_P->).
> 
> Over the years, I've wondered about the performance penalty of passing by
> reference and, when I asked the question at the Summit, the immediate
> answer was that operations took 1.6 times as long.
> 
> With that in mind, I'm following the Copy array approach when working with
> anything but trivial amounts of data. Given that you're dealing with large
> amounts of data, it might be interesting to see if the 1 minute elapsed
> time could be reduced by that change.
> 
> 
> --
> Douglas von Roeder
> 949-336-2902
> 
> On Thu, Nov 17, 2016 at 1:29 PM, Alan Chan <[email protected]> wrote:
> 
>> Isn't it fun and rewarding:-)
>> 
>> Alan Chan
>> 
>> 4D iNug Technical <[email protected]> writes:
>>> My new code imports the same 50 meg file (compiled) in just over 1
>>> minute.
>> 
>> **********************************************************************
>> 4D Internet Users Group (4D iNUG)
>> FAQ:  http://lists.4d.com/faqnug.html
>> Archive:  http://lists.4d.com/archives.html
>> Options: http://lists.4d.com/mailman/options/4d_tech
>> Unsub:  mailto:[email protected]
>> **********************************************************************
>> 
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[email protected]
> **********************************************************************
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: v12+ parsing text [summary]

Reply via email to