On Fri, Nov 22, 2002 at 03:03:08PM -0800, Ian Clarke wrote: > > Are they? The safest thing is certainly to block anything we don't > > understand. > > True, ideally we should be using something like JTidy to parse the HTML > to XML, then filter it, then spit it out to the browser. The JTidy jar > is 142k, but this will slow things down. Additionally, I think JTidy > relies on the XML stuff in post-1.1 versions of Java. I did ask around for a good HTML parser months ago but got no response. Can we bundle JTidy? Does it do CSS as well? > > Basically, to be 100% safe, any given piece of HTML should be assumed > *insecure* unless we can affirm that it isn't. Easier said than done > though. Yeah. That's the point. It _should_ be simple enough, but tedious if we can't use existing code, to just parse the HTML and only let through what is known good. BUT we have to be really careful with I18N - a lot of products have been caught out with that. Thus if we just want to ignore I18N, we need to block all high characters in what should be the main text. And if we want to support it, we need to parse it really carefully i.e. we need to decode it all to UCS4/wchars before trying to parse the HTML at all... but we would also need to make sure it's not a threat to non-i18n-aware browsers, so... could be a lot of work. For little obvious benefit, but it absolutely must be done before 1.0. Volunteers with experience in this area would be greatly appreciated, otherwise I'll end up doing it, at some point. > > Ian. > > -- > Ian Clarke ian@[freenetproject.org|locut.us|cematics.com] > Latest Project http://cematics.com/kanzi > Personal Homepage http://locut.us/ >
-- Matthew Toseland toad at amphibian.dyndns.org amphibian at users.sourceforge.net Freenet/Coldstore open source hacker. Employed full time by Freenet Project Inc. from 11/9/02 to 11/1/03 http://freenetproject.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20021122/838e18df/attachment.pgp>
