A frequent complaint of the new JavaScript skipping code in 3.1.6's
HTML.cc parser is that it gets confused by a "<" in the JavaScript
code, causing it to miss the closing </script> tag.  Here is a patch
that fixes this problem.  As far as I can tell, it works right and
doesn't break anything else, but of course I'd appreciate some other
testers for this.  Apply it using "patch -p0 < this-message-file".

--- htdig/HTML.cc.orig  Wed Jan  9 16:12:31 2002
+++ htdig/HTML.cc       Wed Sep 25 11:50:50 2002
@@ -308,6 +308,13 @@ HTML::parse(Retriever &retriever, URL &b
            if (!q)
              break; // Syntax error in the doc.  Tag never ends.
            position++;
+           if (noindex & TAGscript)
+           {   // Special handling in case '<' is part of JavaScript code
+               while (isspace(*position))
+                   position++;
+               if (mystrncasecmp((char *)position, "/script", 7) != 0)
+                   continue;
+           }
            tag = 0;
            tag.append((char*)position, q - position);
            while (isspace(*position))

This patch should also work fine with 3.2.0b4 snapshots on or after
Sunday, August 2, 2001.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to