According to Brett Baugh:
> Gilles Detillieux wrote:
> >
> > Could you elaborate on this new site? Is it the same OS version as the
> > old site? Same library versions? I'm assuming the same processor and
> > OS on both, or else I wouldn't expect the binaries to work at all on the
> > new site, but what processor and OS are you using? Are you sure the sites
> > are supposed to be binary compatible? Can you try rebuilding the binaries
> > on the problem site?
>
> I use one single Linux box (dual P-II 233, 128 M ram, 2.0.35 kernel,
> apache 1.2.6 with php2 and php3 modules) to serve about 20 different
> sites (virtual sites). All the sites have everything in common;
> binaries, libraries, OS, the works.
I see. That rules out binary incompatibility, doesn't it! ;-)
> > Sorry to provide a whole lot more questions than answers, but without more
> > details about the environment, we're really working blindly, and can't be
> > of much help. Maybe these questions will lead you to the problem yourself.
>
> That it did. One of our brilliant production people decided that it
> would benefit this one particular client to have TWO title tags in
> each document - a normal one and then one that just repeated the
> contents of the meta description tag - so it would get more
> preferential treatment in search engines. GAAAAH. It's a good thing
> she doesn't work here anymore... heh.
Oooh! Search engine spamming! You were right earlier when you said
something evil and nasty was happening on that site!
> So I guess now the question
> is... can you tell htdig to only grab the first title tag it sees? I
> suppose taking out the second <title> is an option; I doubt anyone
> would notice at this point... but that's a lot of typing.
You could probably insert something like this at the start of the
switch statment case 0 clause that handles the title tag, at line 390
of htdig/HTML.cc (in version 3.1.1), just before in_title is set to 1:
if (title.length())
{
if (debug)
cout << "More than one <title> tag in document!"
<< " (possible search engine spamming)" << endl;
break;
}
And again, at the start of the case 1 clause, before resetting in_title
to 0, insert this:
if (!in_title)
break;
This should make any additional titles be indexed just like regular text.
I haven't tried it, though, so test carefully. Let me know how it goes.
This may be worth including in the next release.
> I still can't believe how long I stared at those doc headers without
> seeing that. I guess my brain just filters out certain things without
> asking after, say, the fifth pot of coffee in a day. Thanks for
> putting up with me...
Glad I could steer you in the right direction.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.