On Thursday 24 January 2002 15:34, Geoffrey Young wrote:
> > HTML::Entities correctly turns \x8b into ‹ while Apache::Util leaves
> > it untouched. That character is treated by certain buggy browsers as <
> > and can thus be used to fake tags. Note that just because your browser
> > isn't vulnerable (ie it doesn't buy the fakes h1) doesn't mean that the
> > problem isn't there :-) The source makes it explicit.
> >
> > This is with 1.25 but I don't think it has changed since. The solution is
> > to do what HTML::Entities does, which is basically sprintf "&#x%X;",
> > ord($char) control and high bit chars. I'd submit a patch but I'm not too
> > fluent with C/XS.
>
> I'm probably worse with C than Robin, but here's a patch that seems to fix
> the problem (as I understand it, that is).
>
> the solution is different that HTML::Entities in that it always uses the
> &#184; for characters between 126 and 255, whereas HTML::Entities uses
> stuff like &cedil;

The latter part doesn't matter as browsers now recognize numeric entities a 
vast majority of the time (and when they don't they also don't recognize the 
very extended entities that HTML::Entities has).

However I'm not sure your patch does the right thing re UTF-8, unless there's 
some magic involved that I'm not seeing :-/ I'm no expert on how to deal with 
UTF-8 in C (or even in Perl) but it looks like you're only addressing 8bit 
encodings.

-- 
_______________________________________________________________________
Robin Berjon <[EMAIL PROTECTED]> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
Earth is a beta site.

Reply via email to