A frequent complaint of the new JavaScript skipping code in 3.1.6's
HTML.cc parser is that it gets confused by a "<" in the JavaScript
code, causing it to miss the closing </script> tag. Here is a patch
that fixes this problem. As far as I can tell, it works right and
doesn't break anything else, but of course I'd appreciate some other
testers for this. Apply it using "patch -p0 < this-message-file".
--- htdig/HTML.cc.orig Wed Jan 9 16:12:31 2002
+++ htdig/HTML.cc Wed Sep 25 11:50:50 2002
@@ -308,6 +308,13 @@ HTML::parse(Retriever &retriever, URL &b
if (!q)
break; // Syntax error in the doc. Tag never ends.
position++;
+ if (noindex & TAGscript)
+ { // Special handling in case '<' is part of JavaScript code
+ while (isspace(*position))
+ position++;
+ if (mystrncasecmp((char *)position, "/script", 7) != 0)
+ continue;
+ }
tag = 0;
tag.append((char*)position, q - position);
while (isspace(*position))
This patch should also work fine with 3.2.0b4 snapshots on or after
Sunday, August 2, 2001.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html