Your improvements work great, thank you. And thank you for the very detailed explanations!
On Tuesday, March 8, 2016 at 9:41:11 AM UTC+1, Michal Petrucha wrote: > > On Mon, Mar 07, 2016 at 05:44:08PM -0800, [email protected] <javascript:> > wrote: > > I'm trying to replace *[URL]www.link.com[/URL]* with HTML with this > regexp: > > > > topic.text = re.sub("(\[URL\])(.*)(\[\/URL\])", '<a href="$2">$2</a>', > topic > > .text, flags=re.I) > > > > But it's giving me the following problems: > > > > 1. The $2 capture group is only able to be repeated once, so I get > > <a href="www.link.com">$2</a> > > instead of > > <a href="www.link.com">www.link.com</a> > > I have my doubts – if you use the standard Python re library, then the > way to refer to captured groups is "\1", "\2", etc., not "$1". When I > try the code you posted above, I get the following result (i.e., not > even the first occurrence of "$2" gets substituted):: > > >>> re.sub("(\[URL\])(.*)(\[\/URL\])", '<a href="$2">$2</a>', '[URL] > www.link.com[/URL]', flags=re.I) > '<a href="$2">$2</a>' > > In order to make the substitution work for a single occurrence of > [URL]...[/URL], you can use the following, which uses "\2" (Also, when > writing regular expressions, or other strings that are supposed to > contain the backslash character, it is a good idea to write them as > raw string literals, i.e. prefix them with a "r", which I've done > below; that way, Python won't try to interpret the backslashes as > special characters – otherwise, "\2" would become a character with an > ASCII value of 2):: > > >>> re.sub(r"(\[URL\])(.*)(\[\/URL\])", r'<a href="\2">\2</a>', '[URL] > www.link.com[/URL]', flags=re.I) > '<a href="www.link.com">www.link.com</a>' > > > 2. Only the first *[URL]* is matched. Everything after the first > *[/URL]* > > is simply deleted... > > The solution above gets you halfway there – re.sub will replace all > matches by default, the problem here is that the "(.*)" part of your > regex will matches everything between the first "[URL]", and the last > "[/URL]":: > > >>> re.sub(r"(\[URL\])(.*)(\[\/URL\])", r'<a href="\2">\2</a>', '[URL] > www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com[/URL]', > flags=re.I) > '<a href="www.link1.com[/URL][URL]www.link2.com[/URL][URL] > www.link3.com">www.link1.com[/URL][URL]www.link2.com[/URL][URL] > www.link3.com</a>' > > The reason is that the asterisk operator in a regex is greedy, which > means a ".*" will try to match as much as possible. When you use the > non-greedy version of the operator (which you get by putting a > question mark after the asterisk), you get the result you want:: > > >>> re.sub(r"(\[URL\])(.*?)(\[\/URL\])", r'<a href="\2">\2</a>', '[URL] > www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com[/URL]', > flags=re.I) > '<a href="www.link1.com">www.link1.com</a><a href="www.link2.com"> > www.link2.com</a><a href="www.link3.com">www.link3.com</a>' > > > You can read an explanation of the difference between greedy and > non-greedy regular expressions in the Python docs: > https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy > > Good luck, > > Michal > > > > > I hope someone can help me with this. I'm using Python 2.7 if it makes a > > difference. > > > > -- > > You received this message because you are subscribed to the Google > Groups "Django users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > To post to this group, send email to [email protected] > <javascript:>. > > Visit this group at https://groups.google.com/group/django-users. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/fce5a726-8a4c-455a-a978-6ee70d66464e%40googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/6d3e0a68-ec36-4a7a-bcb5-c57a775e8e59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

