[
https://issues.apache.org/jira/browse/NUTCH-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-734.
----------------------------------------
Resolution: Won't Fix
This is simply not required and dated. Plus I assume by referring to "a", we
mean stop words. These are filtered during the IR process in (all?) modern
indexing servers.
> option to filter "a" tag text
> -----------------------------
>
> Key: NUTCH-734
> URL: https://issues.apache.org/jira/browse/NUTCH-734
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 1.0.0
> Reporter: ron
>
> Motivation:
> When fetching pages with "menue links" the menues (for example search) appear
> on all pages of the site. Searching for the word "search" then returns all
> pages of the site, instead of just returning the the search page.
> Change request:
> Add options to filter texts of "a" tags, or more generally add filters to
> avoid texts within specific tags.
> I have worked around this by changing DOMContentUtils.getTextHelper :
> if (nodeType == Node.TEXT_NODE && !(currentNode.getParentNode() != null
> && "a".equalsIgnoreCase(currentNode.getParentNode().getNodeName())))
> - Ron
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira