On Fri, Oct 01, 2010 at 06:23:06PM +0200, PyroPeter wrote: > On 10/01/2010 05:52 PM, Lukas Fleischer wrote: > >This won't match URLs like > >"https://aur.archlinux.org/packages.php?O=0&K=" and an ampersand at the > >end of an URL won't be converted correctly :/ I'll try to implement it a > >more proper way the next days. Maybe I'll actually go with splitting > >comments at link boundaries as you suggested before... :) > > Well, that's the problem. Which characters should belong to the end of > the URL, and which should not? There could also be cases in which > punctuation belongs to the URL. If punctuation is parsed as not > belonging to the URL, there would be no way to post a working link > to certain URLs. If punctuation is parsed as part of the URL, one could > insert a space between the URL and the punctuation that should not > belong to the URL. One should also consider that inserting an URL into > a sentence looks horrible and is normally not done (by me, at least). > > About splitting at boundaries: Contrary to what I have said before, > using regular expressions seems to be a valid and efficient way. > (I thought you would have to escape tag-content and attributes in > different ways (percent-encoding vs. html-entities). After reading > the HTML4 specification I realized this is not the case, as content and > attributes are both escaped using html-entities) > > Regards, PyroPeter > -- > freenode/pyropeter "12:50 - Ich drücke Return."
I didn't read the whole thread but as far as I understand you're searching for a proper solution how to correctly find urls in comments. John Gruber's Regex seems quite right for this: http://daringfireball.net/2010/07/improved_regex_for_matching_urls Does this help? Jan-Erik (badboy_)