#5606: urlize filter should recognize only the characters which URL RFC specifies. -----------------------------------+---------------------------------------- Reporter: [EMAIL PROTECTED] | Owner: nobody Status: new | Component: Template system Version: SVN | Keywords: Stage: Unreviewed | Has_patch: 0 -----------------------------------+---------------------------------------- Current implementations of urlize filter which uses the code in utils/html.py recognizes URL by splitting the text into many words.
But in Korean (or Japanese) language, this implementation may cause some problems. There is a concept of '조사 (postpositional word)' which is represented as one or more characters following an word ''without'' any spaces. For example,[[BR]] `"나는 http://example.com을 추천합니다."` means, `"I recommend http://example.com."`. [[BR]] The character '을' is not a part of URL, but the current urlize implementation recognizes it as a part of URL. Of course, because there may exist URLs including unicode Korean characters, deciding which character should be excluded from URL is somewhat confusing. However, those cases are very rare because most of Korean URLs are encoded like `'http://example.com/tags/%EB%B8%94%EB%A1%9C%EA%B7%B8'` (`'http://example.com/tags/블로그'` in utf-8 encoding). So I suggest you to modify the code using only characters US-ASCII code for URL auto-linking as specified in [http://www.faqs.org/rfcs/rfc1738.html RFC 1738]. -- Ticket URL: <http://code.djangoproject.com/ticket/5606> Django Code <http://code.djangoproject.com/> The web framework for perfectionists with deadlines --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-updates?hl=en -~----------~----~----~----~------~----~------~--~---
