On Thu, Mar 13, 2003 at 07:58:05PM +0100, Josip Rodin wrote: > On Thu, Mar 13, 2003 at 06:46:35PM +0100, Frank Lichtenheld wrote: > > > The right fix would be simply > > > $long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\<\;$1\>\;,go; > > > > > > Right? > > > > Yours would do also. The main difference in result is that you delete > > the 'URL:' while mine preserves it. Only a cosmetic difference. > > Actually I did that off the top of my head, focusing on the [^>] part. > I thought that the "URL:" part was included in the anchor, but I guess > that's handled by some other part of the code.
Ok. Let's elaborate a little. Sorry if it's too long. $long_desc =~ s,<((URL:)?http://[\S~-]+?/?)>,\<\;$1\>\;,go; ^^ ^ ^ ^ 12 2 X 1 That's the original regex. Included is the first match and so all that's matched beetween '(' 1 and ')' 1. The problem in the bug was that at point X was no whitespace allowed, so I inserted \s* at this place. $long_desc =~ s,<((URL:)?\s*http://[\S~-]+?/?)>,\<\;$1\>\;,go; ^^^ In your regex $long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\<\;$1\>\;,go; ^ ^ ^ ^ ^ 1 1 2 2 Y only what's beetween '(' 2 and ')' 2 is included (because of the '?:' modifier in the first parantheses). So the 'URL:' is discarded. Wether you write [\S~-]+?> or [^>]+> should make no big difference (you are allowing more chars), especially because the first one is a non-greedy match. The \s* at Y is a good addition by you. > > > > + $long_desc =~ s/\&/\&\;/go; > The problem is that if someone puts a proper & in a URL, your regexp > would happily convert it to &amp; :) But why would someone do this? The main place where a long description is displayed is a package manager (dselect/aptitude) not a website. I would consider this a bug in the package, not in the code. But if you want to really allow this you have to write something like: $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&\;/go; Seems to work good but no warranty. Happy regexing ;) Greetings, Frank -- *** Frank Lichtenheld <[EMAIL PROTECTED]> *** *** http://www.djpig.de/ *** see also: - http://www.usta.de/ - http://fachschaft.physik.uni-karlsruhe.de/

