Comment #11 on issue 75 by omattos: Duplicate 'Most Visited' Thumbnails
http://code.google.com/p/chromium/issues/detail?id=75

This is a hard one to fix I guess, but here's a suggestion:  (a bit whacky,  
but
should work)

If the thumbnail images or HTML content of all 9 of those pages was  
compared when a
new page was added, and if the HTML content matches too closely with an  
existing
thumbnail then it is considered duplicate and not added.

I'm guessing it would need to be some kind of "fuzzy" match to avoid  
problems where
pages are slightly different on every load (eg. a page with the page  
generation time
at the bottom).  A fuzzy match on the images themselves might prove easier  
than a
match on the HTML content - the two images could just be subtracted and the  
average
RMS taken on the result to figure out if two sites have the same content.

A more "usual" approach might be to say only one thumbnail from each  
domain, which
should solve most of these problems.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
Automated mail from issue updates at http://crbug.com/
Subscription options: http://groups.google.com/group/chromium-bugs
-~----------~----~----~----~------~----~------~--~---

Reply via email to