On Sep 18, 2007, at 3:10 PM, Peter N Lewis wrote:
I'm looking for a regex pattern that will find quoted strings
(double quotes) but skip (double-)quoted strings containing any of
the following characters: $'"\ (dollar sign, single quote, double
quote, backslash)
At first I tried "[^\$'"\\]+?" but it was matching the end of one
quoted string and the beginning of the next...
(?<!\\)"[^\$'"\\\r\n]*(?<!\\)"
First off, (?<!\\) is a negative lookbehind assertion that the
character before the " is not a backslash.
I've added \r and \n into the exclusion, which should stop any
cases crossing line endings.
And * means zero or more, which is equivalent to +? which means
optionally 1 or more.
Thanks for the help on this. That is definitely better than anything
I had come up with, but I found some more strings that were not
matching quite like I expected...
$this->Error("No database connection set: use
ADOdb_Active_Record::SetDatabaseAdaptor(\$db)", "DB");
<form action="<?php echo $myself.$XFA["update"]; ?>" method="post">
<input type="hidden" name="<?php echo $somevariable; ?>" value="1"/>
<input type="submit" value="<?php echo $is_ok; ?>"/> <input
type="button" value="<?php echo $is_not_ok; ?>"
onclick="window.location='<?php echo $_SERVER["HTTP_REFERER"]; ?>';"/>
I'm thinking I need some way of saying the string should be preceded
by <?php at some point and NOT by ?>, and can optionally end with ?>
(because in PHP you can leave the closing ?> at the end of a file off
if you are not switching back to HTML).
I always thought the question mark following an enumeration marker
was the non-greedy symbol, telling the pattern to find the first
occurrence or something like that...
I'm looking for ways to speed up PHP applications. I've read that
using single instead of double quotes can help because it is less
text the parser has to parse. At present, the current search pattern
returns more than 38,000 matches for one project (but this includes
matches of quoted attributes in straight HTML). If this theory is
true, one would think we'd see a speed-up in the application. I hope
to post the results when I've found a completely reliable regex
(assuming that even exists).
Any idea why these additional strings are failing? Might it have
something to do with the escaped dollar sign or the parentheses? I
think Ronald Kimball has a point: the regex needs to be told what not
to match as well, but since there are so many potential options,
that's a really tall order... I was hoping for something similar!
Thanks again, in advance!
It works in your test case, finding only the last two strings. It
is unlikely to work perfectly it every possible case, but if you
find a case that fails, perhaps it can also be handled.
Enjoy,
Peter.
PS: Top posting is good for cases like this.
--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to: <[EMAIL PROTECTED]>