On Sep 18, 2007, at 3:10 PM, Peter N Lewis wrote:

I'm looking for a regex pattern that will find quoted strings (double quotes) but skip (double-)quoted strings containing any of the following characters: $'"\ (dollar sign, single quote, double quote, backslash)

At first I tried "[^\$'"\\]+?" but it was matching the end of one quoted string and the beginning of the next...

(?<!\\)"[^\$'"\\\r\n]*(?<!\\)"

First off, (?<!\\) is a negative lookbehind assertion that the character before the " is not a backslash.

I've added \r and \n into the exclusion, which should stop any cases crossing line endings.

And * means zero or more, which is equivalent to +? which means optionally 1 or more.

Thanks for the help on this. That is definitely better than anything I had come up with, but I found some more strings that were not matching quite like I expected...


$this->Error("No database connection set: use ADOdb_Active_Record::SetDatabaseAdaptor(\$db)", "DB");

<form action="<?php echo $myself.$XFA["update"]; ?>" method="post">

<input type="hidden" name="<?php echo $somevariable; ?>" value="1"/>

<input type="submit" value="<?php echo $is_ok; ?>"/> <input type="button" value="<?php echo $is_not_ok; ?>" onclick="window.location='<?php echo $_SERVER["HTTP_REFERER"]; ?>';"/>


I'm thinking I need some way of saying the string should be preceded by <?php at some point and NOT by ?>, and can optionally end with ?> (because in PHP you can leave the closing ?> at the end of a file off if you are not switching back to HTML).

I always thought the question mark following an enumeration marker was the non-greedy symbol, telling the pattern to find the first occurrence or something like that...

I'm looking for ways to speed up PHP applications. I've read that using single instead of double quotes can help because it is less text the parser has to parse. At present, the current search pattern returns more than 38,000 matches for one project (but this includes matches of quoted attributes in straight HTML). If this theory is true, one would think we'd see a speed-up in the application. I hope to post the results when I've found a completely reliable regex (assuming that even exists).

Any idea why these additional strings are failing? Might it have something to do with the escaped dollar sign or the parentheses? I think Ronald Kimball has a point: the regex needs to be told what not to match as well, but since there are so many potential options, that's a really tall order... I was hoping for something similar!

Thanks again, in advance!



It works in your test case, finding only the last two strings. It is unlikely to work perfectly it every possible case, but if you find a case that fails, perhaps it can also be handled.

Enjoy,
   Peter.
PS: Top posting is good for cases like this.


--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Reply via email to