On Thursday 24 January 2002 15:34, Geoffrey Young wrote: > > HTML::Entities correctly turns \x8b into ‹ while Apache::Util leaves > > it untouched. That character is treated by certain buggy browsers as < > > and can thus be used to fake tags. Note that just because your browser > > isn't vulnerable (ie it doesn't buy the fakes h1) doesn't mean that the > > problem isn't there :-) The source makes it explicit. > > > > This is with 1.25 but I don't think it has changed since. The solution is > > to do what HTML::Entities does, which is basically sprintf "&#x%X;", > > ord($char) control and high bit chars. I'd submit a patch but I'm not too > > fluent with C/XS. > > I'm probably worse with C than Robin, but here's a patch that seems to fix > the problem (as I understand it, that is). > > the solution is different that HTML::Entities in that it always uses the > ¸ for characters between 126 and 255, whereas HTML::Entities uses > stuff like ¸
The latter part doesn't matter as browsers now recognize numeric entities a vast majority of the time (and when they don't they also don't recognize the very extended entities that HTML::Entities has). However I'm not sure your patch does the right thing re UTF-8, unless there's some magic involved that I'm not seeing :-/ I'm no expert on how to deal with UTF-8 in C (or even in Perl) but it looks like you're only addressing 8bit encodings. -- _______________________________________________________________________ Robin Berjon <[EMAIL PROTECTED]> -- CTO k n o w s c a p e : // venture knowledge agency www.knowscape.com ----------------------------------------------------------------------- Earth is a beta site.