Including inlink anchor text in index can create irrelevant search results.
---------------------------------------------------------------------------
Key: NUTCH-574
URL: https://issues.apache.org/jira/browse/NUTCH-574
Project: Nutch
Issue Type: Bug
Components: indexer
Environment: All, basic indexing filter
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.0.0
Currently the basic indexing filter includes inbound anchor text for a given
URL in the index. This sometimes allows pages to show up in search results
where they may not be relevant. An example of this is a search for "dallas
hotels" in our production index (www.visvo.com). Google would show up first in
this example although there is no text matching either dallas or hotels on the
google home page. What is happening here is there are inlinks into google with
the words dallas and hotels which get included in the index for google.com and
because google would have a very high boost due to inlinks, google shows up
first for these search terms. I propose we add an option to allow/prevent
inlink anchor text from being included in the index and set the default for
this option to NOT include inbound link anchor text.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.