Re: quality of search text

Howie Wang Sun, 12 Mar 2006 14:34:12 -0800

I'd agree that (2) is quite important for the end user; Richard'scontinuous text heuristic may actually work for that. I'd extend themeaning of "continuous block" to ignore inline tags such as SPAN, I, B, TTetc, so only certain tags would actually break the content into chunks.Snippets then would be generated from these chunks alone, ignoring therest of the content. If this heuristic is applied only atsnippet-generation time then Andrzej's concern about missing content isnot relevant anymore.
Hmm... I'm not convinced. How would you generate the best snippet from arelevant, but ignored chunk?


Maybe eventually this could be the start of using tags to boost
certain sections of the page as Google probably does. Normal
text blocks would have a boost of 1.0, while stuff within <B>, <H*>
might be boosted by 1.5. Stuff within suspected navigation text
could be de-boosted by 0.25 or something. Maybe that would
be a more appropriate way of handling relevance of navigation
text. It should have some relevance, but not as much as content.

Maybe the summary text could somehow ignore the de-boosted
sections to improve readability unless the content doesn't have
a better match. You basically construct a snippet giving preference
according to the boost value of the section of text.

This all sounds like a lot of work though :)

Howie

Re: quality of search text

Reply via email to