option to filter "a" tag text ----------------------------- Key: NUTCH-734 URL: https://issues.apache.org/jira/browse/NUTCH-734 Project: Nutch Issue Type: New Feature Affects Versions: 1.0.0 Reporter: ron
Motivation: When fetching pages with "menue links" the menues (for example search) appear on all pages of the site. Searching for the word "search" then returns all pages of the site, instead of just returning the the search page. Change request: Add options to filter texts of "a" tags, or more generally add filters to avoid texts within specific tags. I have worked around this by changing DOMContentUtils.getTextHelper : if (nodeType == Node.TEXT_NODE && !(currentNode.getParentNode() != null && "a".equalsIgnoreCase(currentNode.getParentNode().getNodeName()))) - Ron -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.