On Tue, 24 Apr 2001, mouss wrote: > The correct way to do it is something like this: > assume you want to match a URL "u" with a balacklisted URL "b". > then first decompose each of them to the base components: > scheme (http, ftp, ...) I believe the spec. has a user identification thing in here. It's important because it's extremely poorly seperated and can contain anything- it makes things like http://www.microsoft.com@realURL possible for some sets of browsers- the ultimate in URLized social engineering :( (I'm not sure the seperator is an at symbol, but I'm just not in a good enough mood to trapse through the HTTP spec. again.) [snip] > [server] > for the server, there are 4 cases: > > 2. both "u" and "b" use hostname based expressions. then the usual regex > matching > is used. so www.playboy.com matches *.playboy.com > I believe you've missed the %dd conversion step, which can be per-character. That's what makes HTTP so much fun and pattern matching on URIs so much pain... > so all this is known since a long time, has been coded and documented. but > still, people > write flawed software. No doubt in part caused by protocol design specifications that don't take into account downstream usage issues. HTTP *sucks* as a protocol- no length restrictions, no code normalizaitons, no structure worth mentioning, poor tokenization.... You know, I bet that everyone on the list passes HTTP and probably no more than three people have even done a cursorary protocol evaluation on it. I wonder who the third person is? When you look at specs like HTTP and FTP, then the list of protocols "supported" by most firewalls it's not comforting. Paul ----------------------------------------------------------------------------- Paul D. Robertson "My statements in this message are personal opinions [EMAIL PROTECTED] which may have no basis whatsoever in fact." - [To unsubscribe, send mail to [EMAIL PROTECTED] with "unsubscribe firewalls" in the body of the message.]
