newsclipperdevlist  

bug in MakeLinksAbsolute?

Vadim Strizhevsky
Fri, 10 Sep 1999 14:05:17 -0700


One of the handlers I wrote seemed to misbihave. It was returning
"broken" links. After some digging I found that what I first thought
was handler's bug is actually a combination of non-compliant html and
what I think is a bug in MakeLinksAbsolute. Here's what happens. The
site uses the following links:

<a href=test.html class=myclass>test</a>

Now as far as I know this is not fully compliant html (?) because of
absent quotes. But I think it should work. But what I get back after
GetHtml is: 

<a href=http://foo.com/test.html%20class=myclass>test</a>

Which of course doesn't work. Anyway, I think the problem is that 
MakeLinksAbsolute matches the whole thing, not just upto the
space. And then URI (I think?) inserts that "%20" instead of
space. Why is that?

I trivially fixed by changing MakeLinksAbsolute to include space in
its matching expression:

-----
      # Then the interesting part
      ([^'" >]+)
--------

Now this solves my problem, but this may not be good in general
because links may have spaces. I.e: the following is valid:

<a href="test.html#foo bar test">test</a> 

And would probably fail my new test. Any suggestions? David? 

I would hate to special case this individual handler to strip out "%20" from
the urls. Doesn't sound as a good solution.

Thanks,
Vadim

-
If you would like to unsubscribe from this mailing list send an email to 
[EMAIL PROTECTED] with the body "unsubscribe newsclipperdevlist 
YOUR_EMAIL_ADDRESS" (without the quotes) or use the form provided at 
http://www.NewsClipper.com/TechSup.htm#MailingList.