Hi,
I don't know whether it's a bug, but I faced this problem.
Our so called content management software, ( an inhouse Vignette)
when parses the pages (HTML) it takes up the top, bottom, right,
left and stores them in the output directory. At that time when the
content is joined to the template, there are sometimes more then
one title tag. Also sometimes when an file is included via SSI
it contains the title, html, head and body tags.
To cut the long story short, our HTML Designing sucks...
NOW when the pages are crawled by ./index, it takes all the
title tags and stores them in the database. Many times when
I search for a particular keyword, it displayes...
*TITLE = Information on stocks and investment, India
Infoline.com...Untitled Document.*
Now here the "Untitled document" is the include file which
also had a title tag in it. Both the <title> tags are stored
and are displayed in the result page.
In our site we have this on more or less 85% of the pages.
I am sure this problem will also be there on some other sites.
So is it good enough to take all the title tags from a
page and display them in the result page or just take
the first title tag and store them in database...
Just a thought, responses are welcome.
Signing off
Kaps
\\ \\\ | /// //
\\ \\ | /////
\\\\~ ~////
( @ @ )
------oOOo-(_)-oOOo--------
One drink is just right; two is too many;
three are too few!
---------------oooO--------( )----
( ) ) /
\ ( (_/
\_)
My PGP Public Block - http://www.ideaonline.net/pgp/pgp.txt
India Infoline.com is invincible -
http://www.forbesbest.com/asp/site.asp?site=3041
Net Prodigy - Your Date with the Net -
http://www.indiainfoline.com/week/netp.html?sig