On Fri, Feb 22, 2008 at 7:37 PM, Mark Sapiro <[EMAIL PROTECTED]> wrote: > >Gets converted into: > > this is another url <A > >HREF="http://www.yahoo.com,">http://www.yahoo.com,</A> > > and so is this <A > >HREF="http://www.ibm.com">http://www.google.com</A>. > > > I assume that's a typo and 'ibm' should be 'google'.
:-) Yep. I had used www.ibm.com and www.mbi.com in my test and changed them to G! and Y! for the email, but missed one reference. > >So, the problem seems to appear with commas too which makes me wonder > >if this can be resolved with this: > > > > urlpat = re.compile(r'(\w+://[^>)\s]+?)(\.|,)?(\s|$)') # URLs in text > > > >but then I got to thinking about any other punctuation make that > >follows a URL... and my mind started spinning :-) > > > I think the suggestion above - (\.|,)? would work for comma, but you > could do it other ways - e.g. > > > urlpat = re.compile(r'(\w+://[^>)\s]+?)[.,;]?(\s|$)') # URLs in text > > to handle '.', ',' and ';', and you could extend that with more > characters, but you really need to be careful. Consider for example, > <http://www.example.com/some/page#anchor.> which could be a valid URL > ending in '.'. Understood. I think the "[.,;]" would cover 99% of the possibilities of a URL in a sentence. Thanks again! -Jim P. ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp