On 15 March 2011 12:41, Ben Schmidt <mail_ben_schm...@yahoo.com.au> wrote:
>>>>>    static $re = '/(^|[^\\\\])\'/';
>>>
>>> Did no one see why the regex was wrong?
>
> I saw what the regex was. I didn't think like you that it was 'wrong'.
>
> Once you unescape the characters in the PHP single-quoted string above
> (where two backslashes count as one, and backslash-quote counts as a
> quote), the actual pattern that reaches the preg_replace function is:
>
>   /(^|[^\\])'/
>
>>> RegexBuddy (a windows app) explains regexes VERY VERY well.
>
> What kind of patterns? Does it support PCRE ones?
>

Yep and MANY other flavours (C#,  C++,  Dephi, Groovy, Java,
Javascript, MySQL, ...)

>> The important bit (where the problem lies with regard to the regex) is
>> ...
>>
>> Match a single character NOT present in the list below «[^\\\\]»
>>         A \ character «\\»
>>         A \ character «\\»
>
> This is not the case.
>
> 1. As above, the pattern reaching preg_replace is /(^|[^\\])'/
>
> 2. PCRE, unlike many other regular expression implementations, allows
> backslash-escaping inside character classes (square brackets). So the
> doubled backslash only actually counts as a single backslash character
> to be excluded from the set of characters the atom will match.
>
> There is no error here. (And even if there were two backslashes being
> excluded, of course, it wouldn't hurt anything or change the meaning of
> the pattern.)
>
>> The issue is the word _single_.
>
> I don't think anybody thought otherwise.
>
> The problem was that, to a casual observer, the pattern seems to mean "a
> quote which doesn't already have a backslash before it". I believe this
> was its intent. (And the replacement added the 'missing' backslash.)
>
> But the pattern doesn't mean that. It actually means "a character which
> isn't a backslash, followed by a quote". This is subtly different.
>
> And it's most noticeable when two quotes follow each other in the
> subject string. In
>
>   str''str
>
> first the pattern matches "r'" (non-backslash followed by quote), and
> then it keeps searching from that point, i.e. it searches "'str". Since
> this isn't the beginning of the string, and there is no quote following
> a non-backslash character, there are no further matches.
>
> Now, here is a pattern which actually means "a quote which doesn't
> already have a backslash before it" which is achieved by means of a
> lookbehind assertion, which, even when searching the string after the
> first match, "'str", still 'looks back' on the earlier part of the
> string to recognise the second quote is not preceded by a backslash and
> match a second time:
>
>   /(^|(?<!\\))'/
>
> As a PHP single-quoted string this is:
>
>   '/(^|(?<!\\\\))\'/'
>
> Hope this helps,
>
> Ben.
>
>
>
>

If I say ...

<?php
echo  '/(^|[^\\\\])\'/';
?>

I get ...

/(^|[^\\])'/


which is explained as ...



(^|[^\\])'

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into
backreference number 1 «(^|[^\\])»
   Match either the regular expression below (attempting the next
alternative only if this one fails) «^»
      Assert position at the beginning of a line (at beginning of the
string or after a line break character) «^»
   Or match regular expression number 2 below (the entire group fails
if this one fails to match) «[^\\]»
      Match any character that is NOT a \ character «[^\\]»
Match the character “'” literally «'»

And that certainly makes a LOT more sense.

Decoding regexes and handling the escaping needed for the language is
a real headache sometimes.

Just imagine creating regex code for use by client side Javascript using PHP.

8 \ in a row for a single \ wouldn't be impossible.

Sorry for the confusion.


-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to