option to filter "a" tag text
-----------------------------

                 Key: NUTCH-734
                 URL: https://issues.apache.org/jira/browse/NUTCH-734
             Project: Nutch
          Issue Type: New Feature
    Affects Versions: 1.0.0
            Reporter: ron


Motivation:
When fetching pages with "menue links" the menues (for example search) appear 
on all pages of the site. Searching for the word "search" then returns all 
pages of the site, instead of just returning the the search page.

Change request:
Add options to filter texts of "a" tags, or more generally add filters to avoid 
texts within specific tags.

I have worked around this by changing DOMContentUtils.getTextHelper : 

     if (nodeType == Node.TEXT_NODE && !(currentNode.getParentNode() != null && 
"a".equalsIgnoreCase(currentNode.getParentNode().getNodeName()))) 

- Ron

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to