Including inlink anchor text in index can create irrelevant search results.
---------------------------------------------------------------------------

                 Key: NUTCH-574
                 URL: https://issues.apache.org/jira/browse/NUTCH-574
             Project: Nutch
          Issue Type: Bug
          Components: indexer
         Environment: All, basic indexing filter
            Reporter: Dennis Kubes
            Assignee: Dennis Kubes
             Fix For: 1.0.0


Currently the basic indexing filter includes inbound anchor text for a given 
URL in the index.  This sometimes allows pages to show up in search results 
where they may not be relevant.  An example of this is a search for "dallas 
hotels" in our production index (www.visvo.com).  Google would show up first in 
this example although there is no text matching either dallas or hotels on the 
google home page.  What is happening here is there are inlinks into google with 
the words dallas and hotels which get included in the index for google.com and 
because google would have a very high boost due to inlinks, google shows up 
first for these search terms.  I propose we add an option to allow/prevent 
inlink anchor text from being included in the index and set the default for 
this option to NOT include inbound link anchor text.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to