On Thu, Sep 30, 2010 at 08:56:56PM +0200, PyroPeter wrote: > Well, but you are encoding existing entities, that are not "&" as > "&foo;". See the example below.
Yep, and that's how it's supposed to be. There shouldn't be any entities that users put in the comments and that are not encoded. > I see, "$var[] = foo" creates the array $var if necessary and appends > foo. Correct. > Imo, you should split the message at the link boundaries. > ( "foo ", "http://foo.bar.tld", " baz") > Then you should encode the html-entities in all elements, wrap the links > in <a>'s, and then join all that together. Yes... That would be cleaner, but also way more complicated to implement and would require huge amounts of code for making links clickable. > == example 1 == > > input: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz" > > If I am not mistaken, $regex would be > "/http://foo.tld/iLikeToUseApersands/foo&bar.html/msS" > (are the "/" correctly escaped? I will assume they are.) > > Then, $regex would be: > "/http:\/\/foo\.tld\/iLikeToUseApersands\/foo&bar\.html/msS" > > $comment would be set by htmlspecialchars() to: > "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz" > > => preg_replace_callback() would not match, as & got replaced. Why should it not work? preg_replace_callback() still matches if the URL contains a semicolon. This will be parsed and output a valid link (tested with current GIT version and patch applied). > You can also link to a homepage using valid URL's. The additional > "feature" may be nice, but makes the code more complex. It also > trains users to omit the "http://" and produces more work for devs, > as they all now have to parse this invalid hostname+path stuff. Hm, that's a question of taste. We'll let Loui decide :p > Unrelated: You seem to accept only a-zA-Z in hostnames? Or does > PHP's \w include 0-9 and language-dependent letters? What about > underscores? "\w" in perl compatible regex includes all alphanumeric characters plus the underscore ("_"). > Why does the <a>'s content only include the Path of the URL? It doesn't. The "<a></a>"'s content contains excactly what the user typed (with special chars converted by htmlspecialchars()). Please don't just assume things but test your examples using a current GIT checkout with the patch applied in future.
