Markus Jelsma created NUTCH-1434:
------------------------------------

             Summary: Indexer to delete robots noIndex
                 Key: NUTCH-1434
                 URL: https://issues.apache.org/jira/browse/NUTCH-1434
             Project: Nutch
          Issue Type: New Feature
          Components: indexer
    Affects Versions: 1.5.1
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.6


Nutch does not treat pages with meta robots="noindex" properly. All it does is 
remove the title and content fields from the parsed data. It does not stop 
those pages from being indexed, nor can it delete existing pages from the index 
if they change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to