[ https://issues.apache.org/jira/browse/NUTCH-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-734. ---------------------------------------- Resolution: Won't Fix This is simply not required and dated. Plus I assume by referring to "a", we mean stop words. These are filtered during the IR process in (all?) modern indexing servers. > option to filter "a" tag text > ----------------------------- > > Key: NUTCH-734 > URL: https://issues.apache.org/jira/browse/NUTCH-734 > Project: Nutch > Issue Type: New Feature > Affects Versions: 1.0.0 > Reporter: ron > > Motivation: > When fetching pages with "menue links" the menues (for example search) appear > on all pages of the site. Searching for the word "search" then returns all > pages of the site, instead of just returning the the search page. > Change request: > Add options to filter texts of "a" tags, or more generally add filters to > avoid texts within specific tags. > I have worked around this by changing DOMContentUtils.getTextHelper : > if (nodeType == Node.TEXT_NODE && !(currentNode.getParentNode() != null > && "a".equalsIgnoreCase(currentNode.getParentNode().getNodeName()))) > - Ron -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira