Re: Regex (MatchText) speed
On Jun 25, 2004, at 2:43 AM, Ken Ray wrote: For a whole bunch of ways to increase performance, check out Wil Dijkstra's awesome research into increasing script performance: There is some good stuff there... I just went through and removed all of my put i + 1 into i routines. ;-) -- Troy RPSystems, Ltd. http://www.rpsystems.net ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
On Jun 24, 2004, at 1:49 AM, Brian Yennie wrote: For example, if I wanted to remove HTML tags from text, I'd probably use something hand-crafted for speed. If I wanted to verify a valid URL in a strict standards-compliant sense, I'd probably drop in the nasty RegEx because I wouldn't trust my hand-crafted code to catch everything without a ton of effort. Hand crafted for speed that is the part I'm going to have to nail down, through some testing. In other tools I work with, the regex almost always comes out on top, especially after the first round, since patterns have been cached in the engine. It is sounding to me as though in Rev that might not be the case, and a couple hundred lines of hand parsing may actually be faster than a single line matchText and a bunch of fillable back references. Given Rev's string-based nature, I'm not totally surprised if that is the case. Well... I'm pretty surprised, but not totally. I really don't want to try optimizing this twice (it is big), so if anyone has actual experience at comparing the two for speed, I'd still take any input at all. At this point I'm figuring to do this part later tomorrow or Friday, so cast yer votes! ;-) -- Troy RPSystems, Ltd. http://www.rpsystems.net ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
take any input at all. At this point I'm figuring to do this part later tomorrow or Friday, so cast yer votes! ;-) Well, here are a few tips for the bucket: 1) Offset() is the fastest call you've got. Try to avoid lineOffset() or itemOffset() if possible. Contains can also be slower, for some reason. 2) Determine if you need case-sensitivity. Setting the caseSensitive to TRUE (it's FALSE by default) provides around a 50% speedup by itself. Of course you can only use it if you actually are doing case sensitive work, OR you are able to convert things with upper() or lower() first and then restore them later in a clever fashion. 3) If things get complicated, consider using state variables to track where you are 4) Try to do everything in one pass (or as few as possible) if the data is large 5) Always use repeat for each when looping, not repeat with 6) If you've got it working and it just needs a little more pumping, people on list (who me?) tend to like to optimize things that get posted here provided that the poster has taken a good wack at it first =). HTH, - Brian ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
On Wednesday, June 23, 2004, at 10:39 PM, Troy Rollins wrote: Thanks Mark. I'll look at this. And, while I don't doubt this has provided good results, may I ask why you use it instead of a matchText? You didn't indicate if you had done some form of speed test, or in fact ever compared the two alternatives. I'm just trying to get a sense of the value of matchText in terms of speed, or if it is a convenience for those more comfortable with regular expression syntax (which I can use... but I FIGHT with them every time.) -- Troy I did do speed tests and my pull-parser was a little faster. It's also easier to use for me. I can substitute variables for chunks instead of those more hard-coded forms. --put getElement(spot1, spot2, myText) into theElement Mark ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
Mark- Thursday, June 24, 2004, 7:03:48 AM, you wrote: MB You learn something new everyday around here. I didn't know that MB offset() worked faster in caseSensitive to TRUE mode. Hindsight is wonderful. It makes sense, now that I think about it. If caseSensitive were false the engine would have to do a couple of conversions before each test. That would have to slow it down. I gotta file this tip away somewhere. -- -Mark Wieder [EMAIL PROTECTED] ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Regex (MatchText) speed
I took a look through the archives, and didn't see anything definitive about speed advantages in Rev of using matchText with regEx, compared to more basic chunking techniques - including contains. I find Rev is a string handling monster with Transcript alone, but don't know just what it is actually doing with Regex... for instance is it a Transcript regex engine or some kind of compiled external? Or a compiled internal? I noted that Tuviah says that the regex engine caches the last 20 patterns, but... Anyone have a real-world sense of the speed difference? I have a parsing routine which I put together hastily, knowing that it would need to be later optimized. I'm edging in on that optimization phase, and I'm wondering what angle I might want to approach it. Speed is definitely a concern. -- Troy RPSystems, Ltd. http://www.rpsystems.net ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
Troy Rollins wrote: I took a look through the archives, and didn't see anything definitive about speed advantages in Rev of using matchText with regEx, compared to more basic chunking techniques - including contains. I find Rev is a string handling monster with Transcript alone, but don't know just what it is actually doing with Regex... for instance is it a Transcript regex engine or some kind of compiled external? Or a compiled internal? I noted that Tuviah says that the regex engine caches the last 20 patterns, but... Anyone have a real-world sense of the speed difference? I have a parsing routine which I put together hastily, knowing that it would need to be later optimized. I'm edging in on that optimization phase, and I'm wondering what angle I might want to approach it. Speed is definitely a concern. Results can vary, depending on what you're doing. The best method (though admittedly tedious) is to implement both and time them. In one specific case I needed to parse HTML attributes, and used both regex and a combination of offset and replace, and the more generalized regex took about twice as long. I've seen similar results with parsing HTML tags, but have done little benchmarking with regex on the assumption that it's generalized conveniences will usually perform slower than a custom algorithm for the job at hand. I would love to be proven wrong, however; crafting custom algorithms for every little text parsing task is indeed tedious. :) -- Richard Gaskin Fourth World Media Corporation ___ Rev tools and more: http://www.fourthworld.com/rev ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
On Wednesday, June 23, 2004, at 06:25 PM, Troy Rollins wrote: Anyone have a real-world sense of the speed difference? I have a parsing routine which I put together hastily, knowing that it would need to be later optimized. I'm edging in on that optimization phase, and I'm wondering what angle I might want to approach it. Speed is definitely a concern. I tried fooling around with a few things to pull-parse non-SGML well-formed text. I got good results fro offset() in some cases. Here this pull-parser stuff again: -- put getElement(record, /record, tZap) into theElement function getElement tStTag, tEdTag, stngToSch put empty into zapped put the number of chars in tStTag into dChars put offset(tStTag,stngToSch) into tNum1 put offset(tEdTag,stngToSch) into tNum2 if tNum1 1 then return error exit getElement end if if tNum2 1 then return error exit getElement end if put char (tNum1 + dChars) to (tNum2 - 1) of stngToSch into zapped return zapped end getElement -- put getAttribute(name, tZap) into theAttribute function getAttribute tAttribute, strngToSearch put empty into zapA put quote into Qx if char 1 of tAttribute = space then put tAttribute = Qx into tAttributeX else put space tAttribute = Qx into tAttributeX end if put the number of chars in tAttributeX into dChars put offset(tAttributeX,strngToSearch) into tNum1 if tNum1 1 then return error exit getAttribute end if put tNum1 + dChars into tNumX put offset(Qx,strngToSearch,tNumX) into tNumZ if tNumX 1 then return error exit getAttribute end if if tNumZ 1 then return error exit getAttribute end if put char tNumX to (tNumX + (tNumZ - 1)) of strngToSearch into zapA return zapA end getAttribute -- put getElementsArray(record, /record, tZap) into theArray function getElementsArray tStartTag, tEndTag, StringToSearch put empty into tArray put 0 into tStart1 put 0 into tStart2 put 1 into tElementNum put the number of chars in tStartTag into dChars repeat put offset(tStartTag,StringToSearch,tStart1) into tNum1 put (tNum1 + tStart1) into tStart1 if tNum1 1 then exit repeat put offset(tEndTag,StringToSearch,tStart2) into tNum2 put (tNum2 + tStart2) into tStart2 if tNum2 1 then exit repeat --if tNum2 tNum1 then exit repeat put char (tStart1 + dChars) to (tStart2 - 1) of StringToSearch into zapped put zapped into tArray[tElementNum] add 1 to tElementNum end repeat return tArray end getElementsArray Mark ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
On Jun 23, 2004, at 9:37 PM, Mark Brownell wrote: Anyone have a real-world sense of the speed difference? I have a parsing routine which I put together hastily, knowing that it would need to be later optimized. I'm edging in on that optimization phase, and I'm wondering what angle I might want to approach it. Speed is definitely a concern. I tried fooling around with a few things to pull-parse non-SGML well-formed text. I got good results fro offset() in some cases. I only wish I were parsing any kind of formed text. The stuff I am parsing is more like chaos. It can be anything from plain English error messages, to server directory lists, to date and time formats... with NO consistent formatting. The first job of my parser is to simply try to determine what the heck it is parsing. I currently have it functional, using if-else ifs, contains, etc. I'm considering replacing ALL of that with one powerful matchText, but if that only ends up costing me time rather than saving it... -- Troy RPSystems, Ltd. http://www.rpsystems.net ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
On Wednesday, June 23, 2004, at 06:50 PM, Troy Rollins wrote: I only wish I were parsing any kind of formed text. The stuff I am parsing is more like chaos. It can be anything from plain English error messages, to server directory lists, to date and time formats... with NO consistent formatting. The first job of my parser is to simply try to determine what the heck it is parsing. I currently have it functional, using if-else ifs, contains, etc. I'm considering replacing ALL of that with one powerful matchText, but if that only ends up costing me time rather than saving it... -- put getElement(record, /record, myText) into theElement -- put getElementsArray(record, /record, myText) into theArray These two functions work with any text for the start spot and any text for the end spot. If you were parsing an email message you could use something like this: -- use the getElement() function from my last message. -- put getElement(record, /record, tZap) into theElement From: Troy Rollins [EMAIL PROTECTED] Date: Wed Jun 23, 2004 6:50:59 PM US/Pacific To: How to use Revolution [EMAIL PROTECTED] Subject: Re: Regex (MatchText) speed Reply-To: How to use Revolution [EMAIL PROTECTED] put getElement(From:, Date:, yourEmail) into mFrom put getElement(Date:, To:, yourEmail) into mDate put getElement(To:, Subject:, yourEmail) into mTo put getElement(Subject:, Reply-To:, yourEmail) into mSubject These example above would require striping the line returns. To get the body of the email you could use: put getElement(Reply-To:, From:, yourEmail) into mBody ...and strip the first line of mBody Mark ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
On Jun 24, 2004, at 1:08 AM, Mark Brownell wrote: These two functions work with any text for the start spot and any text for the end spot. Thanks Mark. I'll look at this. And, while I don't doubt this has provided good results, may I ask why you use it instead of a matchText? You didn't indicate if you had done some form of speed test, or in fact ever compared the two alternatives. I'm just trying to get a sense of the value of matchText in terms of speed, or if it is a convenience for those more comfortable with regular expression syntax (which I can use... but I FIGHT with them every time.) -- Troy RPSystems, Ltd. http://www.rpsystems.net ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Regex (MatchText) speed
Troy, My 2 cents- it totally depends on what kind of RegEx you're ending up with. RegEx buys you power for expressing complex rules. A hand crafted solution will almost always run faster, it's just a matter of whether you can afford to write new code for every case instead of just accumulating expressions and letting the RegEx engine do the work. For example, if I wanted to remove HTML tags from text, I'd probably use something hand-crafted for speed. If I wanted to verify a valid URL in a strict standards-compliant sense, I'd probably drop in the nasty RegEx because I wouldn't trust my hand-crafted code to catch everything without a ton of effort. - Brian ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution