This patch should fix the problem of indexing part of long
tags such as
<A HREF=.... onmouseover="document.write('<BR>foo');">

Earlier index considered closing bracket after BR to be
closing char for <A, so foo was indexed. Patch should 
fix this problem.

Can somebody test it and report success/failure? It can
decrease indexing speed a little, so if you'll test speed
difference I would be very appreciated.

--kir.

-------- Original Message --------
From: Kir Kolyshkin <[EMAIL PROTECTED]>
Subject: Re: http://intranet.enovasion.dk/cgi-bin/s.cgi
To: [EMAIL PROTECTED]

Kir Kolyshkin wrote:
> As for index bug (showing onmouseout) it is caused by symbols
> <> inside <AREA... tag.

Please test the attached patch - it should fix your "onmouseout"
problem. To apply, copy the file to your aspseek directory with
cpp files (aspseek-1.0.3/src/), then do

patch -p0 < charsets.diff
make
make install

Try to clear and reindex everything, it should work. Please report
success or failure. This will be included in 1.0.4 if it works.

--   [EMAIL PROTECTED]      http://kir.sever.net      ICQ 7551596   --
Answers: $1, short: $5, correct: $25, dumb questions are still free.
Index: charsets.cpp
===================================================================
RCS file: /home/cvs/aspseek/src/charsets.cpp,v
retrieving revision 1.22
diff -u -r1.22 charsets.cpp
--- charsets.cpp        2001/01/10 14:01:01     1.22
+++ charsets.cpp        2001/03/06 22:46:25
@@ -641,16 +641,32 @@
 
 int ParseTag(CTag& tag, int& len, char* s, char*& e)
 {
-       e = strchr(s,'>');
-       if (e)
-       {
-               len = e - s + 1;
-       }
-       else
-       {
-               len = strlen(s);
-       }
-//     char* tmp = (char*)alloca(len + 1);
+       // First, try to find corresponding ending > tag, skipping quoted text
+       int inquote = 0;
+       int indquote = 0;
+       int finish = 0;
+       e = s;
+       do {
+               e++;
+               switch (*e)
+               {
+               case '>':
+                       if ((!inquote) && (!indquote))
+                               finish = 1;
+                       break;
+               case '\'':
+                       if (!indquote)
+                               inquote = ~inquote;
+                       break;
+               case '"':
+                       if (!inquote)
+                               indquote = ~indquote;
+                       break;
+               case NULL:
+                       finish = 1;
+               }
+       } while (!finish);
+       len = e - s + 1;
        char* tmp = Alloca(len + 1);
        strncpy(tmp, s, len);
        tmp[len] = 0;

Reply via email to