Re: Regex (MatchText) speed

2004-06-25 Thread Troy Rollins
On Jun 25, 2004, at 2:43 AM, Ken Ray wrote:
For a whole bunch of ways to increase performance, check out Wil 
Dijkstra's
awesome research into increasing script performance:
There is some good stuff there...
I just went through and removed all of my put i + 1 into i routines.  
;-)
--
Troy
RPSystems, Ltd.
http://www.rpsystems.net

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-24 Thread Troy Rollins
On Jun 24, 2004, at 1:49 AM, Brian Yennie wrote:
For example, if I wanted to remove HTML tags from text, I'd probably 
use something hand-crafted for speed. If I wanted to verify a valid 
URL in a strict standards-compliant sense, I'd probably drop in the 
nasty RegEx because I wouldn't trust my hand-crafted code to catch 
everything without a ton of effort.
Hand crafted for speed that is the part I'm going to have to nail 
down, through some testing. In other tools I work with, the regex 
almost always comes out on top, especially after the first round, since 
patterns have been cached in the engine. It is sounding to me as though 
in Rev that might not be the case, and a couple hundred lines of hand 
parsing may actually be faster than a single line matchText and a 
bunch of fillable back references. Given Rev's string-based nature, I'm 
not totally surprised if that is the case. Well... I'm pretty 
surprised, but not totally.

I really don't want to try optimizing this twice (it is big), so if 
anyone has actual experience at comparing the two for speed, I'd 
still take any input at all. At this point I'm figuring to do this part 
later tomorrow or Friday, so cast yer votes!  ;-)
--
Troy
RPSystems, Ltd.
http://www.rpsystems.net

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-24 Thread Brian Yennie
take any input at all. At this point I'm figuring to do this part 
later tomorrow or Friday, so cast yer votes!  ;-)
Well, here are a few tips for the bucket:
1) Offset() is the fastest call you've got. Try to avoid lineOffset() 
or itemOffset() if possible. Contains can also be slower, for some 
reason.

2) Determine if you need case-sensitivity. Setting the caseSensitive to 
TRUE (it's FALSE by default) provides around a 50% speedup by itself. 
Of course you can only use it if you actually are doing case sensitive 
work, OR you are able to convert things with upper() or lower() first 
and then restore them later in a clever fashion.

3) If things get complicated, consider using state variables to track 
where you are

4) Try to do everything in one pass (or as few as possible) if the data 
is large

5) Always use repeat for each when looping, not repeat with
6) If you've got it working and it just needs a little more pumping, 
people on list (who me?) tend to like to optimize things that get 
posted here provided that the poster has taken a good wack at it first 
=).

HTH,
- Brian
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-24 Thread Mark Brownell
On Wednesday, June 23, 2004, at 10:39  PM, Troy Rollins wrote:
Thanks Mark. I'll look at this.
And, while I don't doubt this has provided good results, may I ask why 
you use it instead of a matchText? You didn't indicate if you had done 
some form of speed test, or in fact ever compared the two 
alternatives. I'm just trying to get a sense of the value of matchText 
in terms of speed, or if it is a convenience for those more 
comfortable with regular expression syntax (which I can use... but I 
FIGHT with them every time.)
--
Troy
I did do speed tests and my pull-parser was a little faster. It's also 
easier to use for me. I can substitute variables for chunks instead of 
those more hard-coded forms.
--put getElement(spot1, spot2, myText) into theElement
Mark

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-24 Thread Mark Wieder
Mark-

Thursday, June 24, 2004, 7:03:48 AM, you wrote:

MB You learn something new everyday around here. I didn't know that
MB offset() worked faster in caseSensitive to TRUE mode.

Hindsight is wonderful. It makes sense, now that I think about it. If
caseSensitive were false the engine would have to do a couple of
conversions before each test. That would have to slow it down. I gotta
file this tip away somewhere.

-- 
-Mark Wieder
 [EMAIL PROTECTED]

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Regex (MatchText) speed

2004-06-23 Thread Troy Rollins
I took a look through the archives, and didn't see anything definitive 
about speed advantages in Rev of using matchText with regEx, compared 
to more basic chunking techniques - including contains. I find Rev 
is a string handling monster with Transcript alone, but don't know just 
what it is actually doing with Regex... for instance is it a 
Transcript regex engine or some kind of compiled external? Or a 
compiled internal?

I noted that Tuviah says that the regex engine caches the last 20 
patterns, but...

Anyone have a real-world sense of the speed difference? I have a 
parsing routine which I put together hastily, knowing that it would 
need to be later optimized. I'm edging in on that optimization phase, 
and I'm wondering what angle I might want to approach it. Speed is 
definitely a concern.

--
Troy
RPSystems, Ltd.
http://www.rpsystems.net
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-23 Thread Richard Gaskin
Troy Rollins wrote:
I took a look through the archives, and didn't see anything definitive 
about speed advantages in Rev of using matchText with regEx, compared to 
more basic chunking techniques - including contains. I find Rev is a 
string handling monster with Transcript alone, but don't know just what 
it is actually doing with Regex... for instance is it a Transcript 
regex engine or some kind of compiled external? Or a compiled internal?

I noted that Tuviah says that the regex engine caches the last 20 
patterns, but...

Anyone have a real-world sense of the speed difference? I have a parsing 
routine which I put together hastily, knowing that it would need to be 
later optimized. I'm edging in on that optimization phase, and I'm 
wondering what angle I might want to approach it. Speed is definitely a 
concern.
Results can vary, depending on what you're doing.  The best method 
(though admittedly tedious) is to implement both and time them.

In one specific case I needed to parse HTML attributes, and used both 
regex and a combination of offset and replace, and the more generalized 
regex took about twice as long.  I've seen similar results with parsing 
HTML tags, but have done little benchmarking with regex on the 
assumption that it's generalized conveniences will usually perform 
slower than a custom algorithm for the job at hand.

I would love to be proven wrong, however; crafting custom algorithms for 
every little text parsing task is indeed tedious. :)

--
 Richard Gaskin
 Fourth World Media Corporation
 ___
 Rev tools and more:  http://www.fourthworld.com/rev
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-23 Thread Mark Brownell
On Wednesday, June 23, 2004, at 06:25  PM, Troy Rollins wrote:
Anyone have a real-world sense of the speed difference? I have a 
parsing routine which I put together hastily, knowing that it would 
need to be later optimized. I'm edging in on that optimization phase, 
and I'm wondering what angle I might want to approach it. Speed is 
definitely a concern.
I tried fooling around with a few things to pull-parse non-SGML 
well-formed text. I got good results fro offset() in some cases.

Here this pull-parser stuff again:
-- put getElement(record, /record, tZap) into theElement
function getElement tStTag, tEdTag, stngToSch
  put empty into zapped
  put the number of chars in tStTag into dChars
  put offset(tStTag,stngToSch) into tNum1
  put offset(tEdTag,stngToSch) into tNum2
  if tNum1  1 then
return error
exit getElement
  end if
  if tNum2  1 then
return error
exit getElement
  end if
  put char (tNum1 + dChars) to (tNum2 - 1) of stngToSch into zapped
  return zapped
end getElement
-- put getAttribute(name, tZap) into theAttribute
function getAttribute tAttribute, strngToSearch
  put empty into zapA
  put quote into Qx
  if char 1 of tAttribute = space then
put tAttribute  =  Qx into tAttributeX
  else
put space  tAttribute  =  Qx into tAttributeX
  end if
  put the number of chars in tAttributeX into dChars
  put offset(tAttributeX,strngToSearch) into tNum1
  if tNum1  1 then
return error
exit getAttribute
  end if
  put tNum1 + dChars into tNumX
  put offset(Qx,strngToSearch,tNumX) into tNumZ
  if tNumX  1 then
return error
exit getAttribute
  end if
  if tNumZ  1 then
return error
exit getAttribute
  end if
  put char tNumX to (tNumX + (tNumZ - 1)) of strngToSearch into zapA
  return zapA
end getAttribute
-- put getElementsArray(record, /record, tZap) into theArray
function getElementsArray tStartTag, tEndTag, StringToSearch
  put empty into tArray
  put 0 into tStart1
  put 0 into tStart2
  put 1 into tElementNum
  put the number of chars in tStartTag into dChars
  repeat
put offset(tStartTag,StringToSearch,tStart1) into tNum1
put (tNum1 + tStart1) into tStart1
if tNum1  1 then exit repeat
put offset(tEndTag,StringToSearch,tStart2) into tNum2
put (tNum2 + tStart2) into tStart2
if tNum2  1 then exit repeat
--if tNum2  tNum1 then exit repeat
put char (tStart1 + dChars) to (tStart2 - 1) of StringToSearch into 
zapped
put zapped into tArray[tElementNum]
add 1 to tElementNum
  end repeat
  return tArray
end getElementsArray

Mark
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-23 Thread Troy Rollins
On Jun 23, 2004, at 9:37 PM, Mark Brownell wrote:

Anyone have a real-world sense of the speed difference? I have a 
parsing routine which I put together hastily, knowing that it would 
need to be later optimized. I'm edging in on that optimization phase, 
and I'm wondering what angle I might want to approach it. Speed is 
definitely a concern.
I tried fooling around with a few things to pull-parse non-SGML 
well-formed text. I got good results fro offset() in some cases.
I only wish I were parsing any kind of formed text. The stuff I am 
parsing is more like chaos. It can be anything from plain English error 
messages, to server directory lists, to date and time formats... with 
NO consistent formatting.

The first job of my parser is to simply try to determine what the heck 
it is parsing. I currently have it functional, using if-else ifs, 
contains, etc. I'm considering replacing ALL of that with one powerful 
matchText, but if that only ends up costing me time rather than saving 
it...

--
Troy
RPSystems, Ltd.
http://www.rpsystems.net
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-23 Thread Mark Brownell
On Wednesday, June 23, 2004, at 06:50  PM, Troy Rollins wrote:
I only wish I were parsing any kind of formed text. The stuff I am 
parsing is more like chaos. It can be anything from plain English 
error messages, to server directory lists, to date and time formats... 
with NO consistent formatting.

The first job of my parser is to simply try to determine what the heck 
it is parsing. I currently have it functional, using if-else ifs, 
contains, etc. I'm considering replacing ALL of that with one powerful 
matchText, but if that only ends up costing me time rather than saving 
it...
-- put getElement(record, /record, myText) into theElement
-- put getElementsArray(record, /record, myText) into theArray
These two functions work with any text for the start spot and any text 
for the end spot.

If you were parsing an email message you could use something like this:
-- use the getElement() function from my last message.
-- put getElement(record, /record, tZap) into theElement
From: Troy Rollins [EMAIL PROTECTED]
Date: Wed Jun 23, 2004  6:50:59  PM US/Pacific
To: How to use Revolution [EMAIL PROTECTED]
Subject: Re: Regex (MatchText) speed
Reply-To: How to use Revolution [EMAIL PROTECTED]
put getElement(From:, Date:, yourEmail) into mFrom
put getElement(Date:, To:, yourEmail) into mDate
put getElement(To:, Subject:, yourEmail) into mTo
put getElement(Subject:, Reply-To:, yourEmail) into mSubject
These example above would require striping the line returns.
To get the body of the email you could use:
put getElement(Reply-To:, From:, yourEmail) into mBody
...and strip the first line of mBody
Mark
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-23 Thread Troy Rollins
On Jun 24, 2004, at 1:08 AM, Mark Brownell wrote:
These two functions work with any text for the start spot and any text 
for the end spot.
Thanks Mark. I'll look at this.
And, while I don't doubt this has provided good results, may I ask why 
you use it instead of a matchText? You didn't indicate if you had done 
some form of speed test, or in fact ever compared the two alternatives. 
I'm just trying to get a sense of the value of matchText in terms of 
speed, or if it is a convenience for those more comfortable with 
regular expression syntax (which I can use... but I FIGHT with them 
every time.)
--
Troy
RPSystems, Ltd.
http://www.rpsystems.net

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Regex (MatchText) speed

2004-06-23 Thread Brian Yennie
Troy,
My 2 cents- it totally depends on what kind of RegEx you're ending up 
with.

RegEx buys you power for expressing complex rules. A hand crafted 
solution will almost always run faster, it's just a matter of whether 
you can afford to write new code for every case instead of just 
accumulating expressions and letting the RegEx engine do the work.

For example, if I wanted to remove HTML tags from text, I'd probably 
use something hand-crafted for speed. If I wanted to verify a valid URL 
in a strict standards-compliant sense, I'd probably drop in the nasty 
RegEx because I wouldn't trust my hand-crafted code to catch everything 
without a ton of effort.

- Brian
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution