Ok, found the bug. The s.cgi forgets to escape HTML special chars from
exceprts. Below is the patch for aspseek.cpp. Please correct the code
as needed sicne i'm not a C/C++ programmer. buf[2048] may overflow if
you need more than 2048 chars in exceprts. So put some bound checking
rules.

good luck
Muzaffar

-----------------
--- aspseek.cpp.org     Wed Aug 13 07:02:37 2003
+++ aspseek.cpp Wed Aug 13 06:17:49 2003
@@ -253,6 +253,14 @@
                                                {
                                                        safe_strcpy(templ->title, "No 
title");
                                                }
+                                               std::string s;
+                                               char buf[2048];
+                                               for(int i=0; i < 
doc.m_excerpts.size(); i++) {
+                                                       s = doc.m_excerpts[i];
+                                                       HtmlSpecialChars(s.c_str(), 
&buf[0]);
+                                                       s = buf;
+                                                       doc.m_excerpts[i] = s;
+                                               }
                                                
templ->texts.insert(templ->texts.begin(), doc.m_excerpts.begin(), 
doc.m_excerpts.end());
 #ifdef UNICODE
                                                templ->textH = &doc.m_excerptsH;
--------------------


MM> Hi,

MM> Let's assume that domain.com/1.html has the followin line in the code:
MM>  <meta name="Description" content="Last updated on
MM> <!--#config timefmt='%m.%d.%Y'--><!--#flastmod file='$P_DOC'-->">

MM> search.com domain indexes domain.com. When you search for some
MM> keywords and 1.html from domain.com is included in results, commented
MM> (actually SSI) lines are also included without the closing tag:

MM> <tr><td colspan=2 width=100%><font class=tx>...updated  on
MM> <!--#config  timefmt=`%m.%d.%Y`-- <B>...

MM> As you see the comment is not closed and the HTML get's screwed up...

MM> I understand why commented text here is also indexed - because it's
MM> quoted. But why s.cgi removed the ">" sign at the end of the comment
MM> so it stays open??

MM> thanks
MM>  Muzaffar                          mailto:[EMAIL PROTECTED]




-- 
 Muzaffar                            mailto:[EMAIL PROTECTED]

Reply via email to