Ok, found the bug. The s.cgi forgets to escape HTML special chars from
exceprts. Below is the patch for aspseek.cpp. Please correct the code
as needed sicne i'm not a C/C++ programmer. buf[2048] may overflow if
you need more than 2048 chars in exceprts. So put some bound checking
rules.
good luck
Muzaffar
-----------------
--- aspseek.cpp.org Wed Aug 13 07:02:37 2003
+++ aspseek.cpp Wed Aug 13 06:17:49 2003
@@ -253,6 +253,14 @@
{
safe_strcpy(templ->title, "No
title");
}
+ std::string s;
+ char buf[2048];
+ for(int i=0; i <
doc.m_excerpts.size(); i++) {
+ s = doc.m_excerpts[i];
+ HtmlSpecialChars(s.c_str(),
&buf[0]);
+ s = buf;
+ doc.m_excerpts[i] = s;
+ }
templ->texts.insert(templ->texts.begin(), doc.m_excerpts.begin(),
doc.m_excerpts.end());
#ifdef UNICODE
templ->textH = &doc.m_excerptsH;
--------------------
MM> Hi,
MM> Let's assume that domain.com/1.html has the followin line in the code:
MM> <meta name="Description" content="Last updated on
MM> <!--#config timefmt='%m.%d.%Y'--><!--#flastmod file='$P_DOC'-->">
MM> search.com domain indexes domain.com. When you search for some
MM> keywords and 1.html from domain.com is included in results, commented
MM> (actually SSI) lines are also included without the closing tag:
MM> <tr><td colspan=2 width=100%><font class=tx>...updated on
MM> <!--#config timefmt=`%m.%d.%Y`-- <B>...
MM> As you see the comment is not closed and the HTML get's screwed up...
MM> I understand why commented text here is also indexed - because it's
MM> quoted. But why s.cgi removed the ">" sign at the end of the comment
MM> so it stays open??
MM> thanks
MM> Muzaffar mailto:[EMAIL PROTECTED]
--
Muzaffar mailto:[EMAIL PROTECTED]