Vadim Strizhevsky
Fri, 10 Sep 1999 14:05:17 -0700
One of the handlers I wrote seemed to misbihave. It was returning "broken" links. After some digging I found that what I first thought was handler's bug is actually a combination of non-compliant html and what I think is a bug in MakeLinksAbsolute. Here's what happens. The site uses the following links: <a href=test.html class=myclass>test</a> Now as far as I know this is not fully compliant html (?) because of absent quotes. But I think it should work. But what I get back after GetHtml is: <a href=http://foo.com/test.html%20class=myclass>test</a> Which of course doesn't work. Anyway, I think the problem is that MakeLinksAbsolute matches the whole thing, not just upto the space. And then URI (I think?) inserts that "%20" instead of space. Why is that? I trivially fixed by changing MakeLinksAbsolute to include space in its matching expression: ----- # Then the interesting part ([^'" >]+) -------- Now this solves my problem, but this may not be good in general because links may have spaces. I.e: the following is valid: <a href="test.html#foo bar test">test</a> And would probably fail my new test. Any suggestions? David? I would hate to special case this individual handler to strip out "%20" from the urls. Doesn't sound as a good solution. Thanks, Vadim - If you would like to unsubscribe from this mailing list send an email to [EMAIL PROTECTED] with the body "unsubscribe newsclipperdevlist YOUR_EMAIL_ADDRESS" (without the quotes) or use the form provided at http://www.NewsClipper.com/TechSup.htm#MailingList.