On Mon, Mar 07, 2016 at 05:44:08PM -0800, [email protected] wrote:
> I'm trying to replace *[URL]www.link.com[/URL]* with HTML with this regexp:
> 
> topic.text = re.sub("(\[URL\])(.*)(\[\/URL\])", '<a href="$2">$2</a>', topic
> .text, flags=re.I)
> 
> But it's giving me the following problems:
> 
>    1. The $2 capture group is only able to be repeated once, so I get 
>    <a href="www.link.com">$2</a>
>    instead of 
>    <a href="www.link.com">www.link.com</a>

I have my doubts – if you use the standard Python re library, then the
way to refer to captured groups is "\1", "\2", etc., not "$1". When I
try the code you posted above, I get the following result (i.e., not
even the first occurrence of "$2" gets substituted)::

    >>> re.sub("(\[URL\])(.*)(\[\/URL\])", '<a href="$2">$2</a>', 
'[URL]www.link.com[/URL]', flags=re.I)
    '<a href="$2">$2</a>'

In order to make the substitution work for a single occurrence of
[URL]...[/URL], you can use the following, which uses "\2" (Also, when
writing regular expressions, or other strings that are supposed to
contain the backslash character, it is a good idea to write them as
raw string literals, i.e. prefix them with a "r", which I've done
below; that way, Python won't try to interpret the backslashes as
special characters – otherwise, "\2" would become a character with an
ASCII value of 2)::

    >>> re.sub(r"(\[URL\])(.*)(\[\/URL\])", r'<a href="\2">\2</a>', 
'[URL]www.link.com[/URL]', flags=re.I)
    '<a href="www.link.com">www.link.com</a>'

>    2. Only the first *[URL]* is matched. Everything after the first *[/URL]* 
>    is simply deleted...

The solution above gets you halfway there – re.sub will replace all
matches by default, the problem here is that the "(.*)" part of your
regex will matches everything between the first "[URL]", and the last
"[/URL]"::

    >>> re.sub(r"(\[URL\])(.*)(\[\/URL\])", r'<a href="\2">\2</a>', 
'[URL]www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com[/URL]', 
flags=re.I)
    '<a 
href="www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com">www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com</a>'

The reason is that the asterisk operator in a regex is greedy, which
means a ".*" will try to match as much as possible. When you use the
non-greedy version of the operator (which you get by putting a
question mark after the asterisk), you get the result you want::

    >>> re.sub(r"(\[URL\])(.*?)(\[\/URL\])", r'<a href="\2">\2</a>', 
'[URL]www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com[/URL]', 
flags=re.I)
    '<a href="www.link1.com">www.link1.com</a><a 
href="www.link2.com">www.link2.com</a><a href="www.link3.com">www.link3.com</a>'


You can read an explanation of the difference between greedy and
non-greedy regular expressions in the Python docs:
https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy

Good luck,

Michal

>    
> I hope someone can help me with this. I'm using Python 2.7 if it makes a 
> difference.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/fce5a726-8a4c-455a-a978-6ee70d66464e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/20160308084020.GE25061%40koniiiik.org.
For more options, visit https://groups.google.com/d/optout.

Attachment: signature.asc
Description: Digital signature

Reply via email to