touché 

-----Original Message-----
From: Jérôme Charron [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 10, 2006 4:34 PM
To: [email protected]; [EMAIL PROTECTED]
Subject: Re: quality of search text


> I think algortihm # 1 is what google uses.
> google ignores content that does not change from page to page, as well

> as content that isn't part of a pblock of text.

Are you sure?
Take a look at this search results:
http://www.google.com/search?hl=en&hs=otT&lr=&c2coff=1&safe=off&client=f
irefox-a&rls=org.mozilla:en-US:official&pwst=1&q=+site:gamingalmanac.com
+global+gaming+almanac
... and you will notice that menus are indexed by google and displayed
in summaries.

But if you can contribute a HtmlParseFilter with ability to remove menus
and navigation, it will be a real improvement. A first step, that I have
developed in a previous project many years ago is to remove pages that
contains textual content only in links: it avoid indexing frames or
iframes that only contains some navigation text...

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to