On Fri, Feb 22, 2008 at 7:37 PM, Mark Sapiro <[EMAIL PROTECTED]> wrote:
>  >Gets converted into:
>  >   this is another url <A
>  >HREF="http://www.yahoo.com,";>http://www.yahoo.com,</A>
>  >            and so is this <A
>  >HREF="http://www.ibm.com";>http://www.google.com</A>.
>
>
>  I assume that's a typo and 'ibm' should be 'google'.

:-) Yep.  I had used www.ibm.com and www.mbi.com in my test and
changed them to G! and Y! for the email, but
  missed one reference.

>  >So, the problem seems to appear with commas too which makes me wonder
>  >if this can be resolved with this:
>  >
>  >   urlpat = re.compile(r'(\w+://[^>)\s]+?)(\.|,)?(\s|$)') # URLs in text
>  >
>  >but then I got to thinking about any other punctuation make that
>  >follows a URL... and my mind started spinning :-)
>
>
>  I think the suggestion above - (\.|,)? would work for comma, but you
>  could do it other ways - e.g.
>
>
>    urlpat = re.compile(r'(\w+://[^>)\s]+?)[.,;]?(\s|$)') # URLs in text
>
>  to handle '.', ',' and ';', and you could extend that with more
>  characters, but you really need to be careful. Consider for example,
>  <http://www.example.com/some/page#anchor.> which could be a valid URL
>  ending in '.'.

Understood.  I think the "[.,;]" would cover 99% of the possibilities
of a URL in a sentence.

Thanks again!

-Jim P.
------------------------------------------------------
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&amp;file=faq01.027.htp

Reply via email to